Computational Lens on Godel's Incompleteness Theorems, Part 1
Computational Lens on Godel's Incompleteness Theorems, Part 1
Computational Lens on Godel's Incompleteness Theorems, Part 1
Computational Lens on Godel's Incompleteness Theorems, Part 1

This two-part blog post has three main goals.

  • The first goal is to demonstrate that the proofs of Gödel’s incompleteness theorems (both the first and the second one) can be naturally expressed using the language of the theory of computation (henceforth ToC). This is because one of the defining features of mathematical reasoning is that it is a computation.

  • The second goal is to show that the incompleteness theorems can be understood without the knowledge of any particular formalization of mathematics (like first-order logic and the axiomatic systems PA or ZFC). This is similar to being able to understand the undecidability of the halting problem without the knowledge of any particular formalization of computation (like Turing machines or lambda calculus).

  • The third goal is to highlight that the use of the language of ToC leads to relatively intuitive proofs of the incompleteness theorems. In fact, we hope that you will feel you could have come up with the results yourself.

Gödel’s incompleteness theorems are mathematically and philosophically some of the most important intellectual discoveries, highlighting the inherent limits of mathematical reasoning. As we will see, the limits of mathematical reasoning can be viewed as a corollary of the limits of computation. And at its root, the limits of computation (and therefore the limits of mathematical reasoning) are about the difference between the finite and the infinite. The foundational work of Georg Cantor in understanding infinity will be our main hammer to prove the limits of computation and mathematics.


This first blog post is about providing the right background for the reader. After talking about formalization of computation, we introduce Cantor’s work and explain how it directly implies limits of computation. In the next post, we will talk about formalization of mathematical reasoning and then prove the incompleteness theorems.

Basics of Computation

In its most general form, computation is manipulation of information/data. Usually there is some input data, which is manipulated/processed in some way to produce output data.


Given this general description, we can see computation pretty much everywhere. That being said, Alan Turing, in coming up with his computing model, was not really thinking about how information is manipulated, say, by black holes. He had a much narrower focus. He wanted to model “computers”, which before the invention of modern computers, referred to people trained in carrying out calculations. These computers would have a set of instructions (i.e. an algorithm) at hand. They would then carry out the given instructions on various input data to produce the output data.


Even though we did not have a formal way of representing algorithms until the 20th century, people have been discovering and using algorithms for millennia (e.g. the grade-school multiplication algorithm, Euclid’s algorithm for computing the greatest common divisor of two numbers, and many many others). We are going to call these informal representations of algorithms “Good Old Regular Algorithms”, or GORA for short. A computational model is a mathematical formalization of GORA such that for every algorithm, there is a precise representation of the algorithm in the computational model.

An important component of a computational model is a precise representation of data. In ToC, we represent any kind/type of data using just a single type, string. Let \(\Sigma\) be some non-empty finite set that we think of as a set of symbols. We call \(\Sigma\) an alphabet. Then \(\Sigma^*\) denotes the set of all finite-length strings over \(\Sigma\). We think of the input data as well as the output data for an algorithm as elements of \(\Sigma^*\). (The finiteness restriction is a reasonable one because we don’t know a way to store or process infinite amount of data.)

Given a set \(X\), we say \(X\) is encodable if there is an injective function \(\text{Enc}: X \to \Sigma^*\) for some alphabet \(\Sigma\). We use the notation \(\left\langle x \right\rangle\) to denote the encoding of \(x \in X\). If we want our algorithm to take as input some object \(x\) that is not a string (say a number), we would first encode it and view \(\left\langle x \right\rangle\) as the input.

We often think of an algorithm as solving some problem. A function problem is a function \(g: \Sigma^* \to \Sigma^*\). For example, in the multiplication problem, if the input is the encoding of a pair of numbers, the output is the encoding of the product of those numbers. A decision problem is a function of the form \(f: \Sigma^* \to \{0,1\}\), representing a problem with a binary output. We can equivalently think of the output as \(\{\text{False}, \text{True}\}\) or \(\{\text{Reject}, \text{Accept}\}\). In ToC, we often restrict our attention to decision problems since more general computational problems often have an “equivalent” decision version. One of the most basic questions in ToC is whether all decision problems are computable/decidable or not.

For the formal representation of algorithms we’ll use Turing machines. Every Turing machine (TM) represents an algorithm. And every algorithm has a corresponding Turing machine. This correspondence is known as the Church-Turing Thesis.

The Church-Turing Thesis (GORA-to-TM Thesis)

The Turing machine computational model is a mathematical formalization of GORA. Any algorithm can be expressed using a TM. Or in other words, every algorithm “compiles down” to a TM.

When we are proving properties of algorithms/computation, we do not directly work with TMs, but rather appeal to the Church-Turing thesis: we work with high-level descriptions of algorithms with the understanding that they can be translated to TMs. Thanks to the Church-Turing thesis, we won’t bother defining the TM model formally. We don’t need it. We’ll describe all our TMs using algorithms with high-level instructions.

It is not clear whether Turing (and others at the time) really appreciated the full reach of the Turing machine model. Even though Turing didn’t build his mathematical model of computation thinking about the fundamental laws of physics, today, our understanding of computation, as well as physics, lead us to make a stronger claim.

The Physical Church-Turing Thesis

Any computation that can be realized in the universe (only constrained by the laws of physics) can be expressed using a TM.

Our focus will be on TMs computing/deciding decision problems \(f : \Sigma^* \to \{0,1\}\). So given a TM \(M\) and an input string \(x\), we’ll write \(M(x) = 1\) if \(M\) returns True (equivalently, “accepts”) when the input is \(x\). We’ll write \(M(x) = 0\) if \(M\) returns False (i.e. “rejects”). And we’ll write \(M(x) = \infty\) if \(M\) never stops running and gets stuck in an infinite loop. A TM that halts (returns True or False) on all inputs is called a decider TM. If the input/output behavior of a TM \(M\) exactly matches a decision problem \(f\) (i.e. for all \(x\), \(M(x) = f(x)\)), then we say \(M\) computes/decides \(f\). And we call \(f\) a decidable decision problem.

One defining feature of a TM (and a feature that will play a crucial role in our arguments) is that it is a finite object. This means every TM has a finite-length string representation. Or in other words, the set of all TMs is an encodable set. As with the finiteness of strings, the finiteness of TMs/algorithms is a reasonable restriction since we can’t physically realize an infinite object. Given a TM \(M\), \(\left\langle M \right\rangle\) denotes its encoding.

The fact that we can encode a Turing machine means that an input to a TM can be the encoding of another TM (in fact, a TM can take the encoding of itself as the input). One important implication of this (that Turing pointed out in his paper) is the existence of a universal Turing machine, which given as input the encoding of any Turing machine \(M\), and some string \(x\), can simulate \(M\) when it is given the input \(x\).


The existence of such a machine shouldn’t be surprising (once you have the right perspective). If we consider the human computers that Turing was modeling, and think about the role of the human in the process, we see that the human is the universal TM. We just have to change our viewpoint slightly and think of the instructions as input.


The high-level description of a universal Turing machine \(U\) is as follows. \[\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; U(\left\langle \text{TM } M, \; \text{string } x \right\rangle):\\ &{\scriptstyle 1.}~~~~\texttt{Simulate $M$ on input $x$.}\\ &{\scriptstyle 2.}~~~~\texttt{Return the value of $M(x)$.}\\ \hline\hline\end{aligned}\] Note that if \(M(x)\) loops forever, then \(U\) loops forever as well.

Above, when denoting the encoding of a tuple \((M, x)\) where \(M\) is a Turing machine and \(x\) is a string, we used \(\left\langle \text{TM } M, \text{string } x \right\rangle\) to make clear what the types of \(M\) and \(x\) are. When we give a high-level description of a TM, we often assume that the input given is of the correct form/type. But technically, the input is allowed to be any finite-length string. Even though this is not explicitly written, we will implicitly assume that the first thing our machine does is check whether the input is a valid encoding of objects with the expected types. If it is not, the machine rejects (returns False). If it is, then it will carry on with the specified instructions.

From now on, in the context of TMs and decision problems, we will fix our alphabet to be \(\Sigma = \{0,1\}\). So a decision problem is of the form \(f: \{0,1\}^* \to \{0,1\}\).

Cantor and Diagonalization

An important part of mathematics is about understanding the relationships among sets. Is a set \(X\) equal to another set \(Y\)? Are they equal in cardinality? Can we find an element in \(X\) that is not in \(Y\) (or vice versa)? Many fundamental questions like these involve infinite sets. For instance, an interesting open question in the 18th century and early 19th century was whether the set of real algebraic numbers (i.e. real numbers that can be expressed as a root of a non-zero polynomial in one variable with integer coefficients) is equal to the set of all real numbers. In the previous section, we have come across another question of this flavor. Is the set of all decidable decision problems equal to the set of all decision problems? In fact, ToC is full of these types of questions. For example, one of the major goals is to classify problems according to the resources needed to solve them, i.e. to define complexity classes, and then try to understand the relationships among the complexity classes.

We need some general techniques to compare infinite sets, and for this, we go to Georg Cantor, who had two major insights.

The first major insight is that bijections, injections and surjections can be used to compare infinite sets. In fact, the most meaningful way to define \(|X| \leq |Y|\) and \(|X| = |Y|\) for infinite sets is to use injections and bijections. In particular, we’ll write \(|X| = |Y|\) if there is a bijection between \(X\) and \(Y\). We’ll write \(|X| \leq |Y|\) (or \(|Y| \geq |X|\)) if there is an injection from \(X\) to \(Y\). And we’ll write \(|X| < |Y|\) if there is no injection from \(Y\) to \(X\).

Many familiar infinite sets have a bijection between them. E.g. \[|\mathbb N| = |\mathbb Z| = |\mathbb Z\times \mathbb Z| = |\mathbb Q| = |\text{Primes}| = |\{0,1\}^*| = |\Sigma^*|,\] where \(\Sigma\) is any finite, non-empty alphabet. It is a nice exercise to show that if a set \(S\) is infinite, then \(|S| \geq |\mathbb N|\). So there are three categories of sets:

  1. finite sets,

  2. sets \(S\) such that \(|S| = |X|\), where \(X\) is any of the sets listed above (like \(\mathbb N\)),

  3. all other sets (i.e. sets \(S\) with \(|S| > |X|\), where \(X\) is any of the sets listed above).

It makes sense to give a name for these different categories. The first two categories combined is known as countable sets. The second category is known as countably infinite sets. And the third category is known as uncountable sets.

We can define a countable set as any set \(S\) with \(|S| \leq |X|\), where \(X = \mathbb N\). But the choice of \(\mathbb N\) here is somewhat arbitrary. We can also choose, for instance, \(X = \Sigma^*\), in which case we see countability is equivalent to encodability. So:

Countability heuristic

A set is countable if its elements have a finite string representation/description.

Encodability captures the essence of countable sets really well. For instance, it is easy to see that for all the sets listed above, the elements have finite string representations. And if you consider another set like \(\{0,1\}^\infty\), which is defined to be the set of all infinite-length binary strings, it is almost by definition that the set is not countable/encodable (though this still requires a proof). Furthermore, encodability highlights that, just like the main difference between the first two categories (finite sets and countably infinite sets) is about the finite vs the infinite, the main difference between countably infinite sets and uncountable sets is also about the finite vs the infinite! In the former, we look at the set themselves, and in the latter, we look at the elements of the sets.

How can we prove that a set like \(\{0,1\}^\infty\) is in fact uncountable? This brings us to the second major insight of Cantor, diagonalization, which is one of the most important proof techniques in mathematics. In fact, diagonalization is the root of all the results that we will present. Perhaps its power and reach is surprising given that the idea is actually pretty straightforward, which we describe now.

The basic question that diagonalization is trying to answer is the following.

Given a set of objects \(\mathcal{F}\) that is part of a larger universe \(\mathcal{U}\) (so \(\mathcal{F}\subseteq \mathcal{U}\)), is \(\mathcal{F}= \mathcal{U}\)? And if \(\mathcal{F}\neq \mathcal{U}\), can we construct an element \(D \in \mathcal{U}\setminus \mathcal{F}\)?

Below are some examples of important questions in this flavor. Diagonalization can be used to answer all of them.

  • Let \(\mathcal{F}\) be the set of rational numbers and \(\mathcal{U}\) be the set of reals. Is there an irrational number, and if there is, can we construct one?

  • It is not hard to give a direct proof that \(\sqrt{2}\) is irrational, so the above question may not be terribly interesting. Here is a much more non-trivial question. Let \(\mathcal{F}\) be the set of real algebraic numbers, and \(\mathcal{U}\) be the set of reals. Is there a real non-algebraic number (i.e. is there a real transcendental number)? And if there is, can we construct one?

  • Let \(\mathcal{F}\) be the set of all decidable decision problems, and let \(\mathcal{U}\) be the set of all decision problems. Is there an undecidable decision problem, and if there is, can we construct one?

  • Let \(\mathcal{F}\) be the set of all decision problems that can be solved in at most \(n^k\) time, and let \(\mathcal{U}\) be the set of all decision problems that can be solved in at most \(n^{k+1}\) time. Is there a decision problem in \(\mathcal{U}\) that is not in \(\mathcal{F}\), and if there is, can we construct one?

  • Let \(\mathcal{F}\) be the set of all provable statements and let \(\mathcal{U}\) be the set of all true statements. Is there a true statement that is not provable?

Even though all these questions have the same basic structure, they are quite different since they involve different types of objects. In order to identify a general technique that can be applied in different settings, we’ll express diagonalization as a statement involving functions, because many mathematical objects can be conveniently viewed as functions.

  • Sets. A set \(S \subseteq X\) can be viewed as a function \(f_S: X \to \{0,1\}\), where \(f_S(x) = 1\) if and only if \(x \in S\). This is called the characteristic function of the set.

  • Finite sequences. A sequence \(s\) of length \(k\) with elements from a set \(Y\) can be viewed as a function \(f_s : \{0,1,2,\ldots,k-1\} \to Y\), where \(f_s(i)\) is the \(i\)’th element of the sequence (starting counting from \(0\)). In other words, given \(f_s\), the corresponding sequence is \((f_s(0), f_s(1), \ldots, f_s(k-1))\).

  • Infinite sequences. Similarly, an infinite-length sequence \(s\) with elements from \(Y\) can be viewed as a function \(f_s : \mathbb N\to Y\).

  • Numbers. Numbers can be viewed as functions since the binary representation of a number is a sequence of bits (possibly infinite-length).

We now rephrase the question above using functions.

Given a set of functions \(\mathcal{F}\), each of the form \(f: X \to Y\), can we construct another function \(f_D : X \to Y\) not in \(\mathcal{F}\)?

Let’s develop a general technique to answer this question. We’ll start the discussion with a finite set of functions \(\mathcal{F}\) and then observe that things naturally generalize to infinite sets.

Suppose you are given a set \(\mathcal{F}\) of \(k\) functions \(f_1, f_2, \ldots, f_k\), each of the form \(f_i : X \to Y\), and asked to construct another function \(f_D : X \to Y\) that is not in \(\mathcal{F}\). We know that two functions \(f_i : X \to Y\) and \(f_j : X \to Y\) are different if there is some input \(x\) such that \(f_i(x) \neq f_j(x)\). So one natural idea is to go through the functions \(f_1, f_2, \ldots, f_k\) in \(\mathcal{F}\), one by one, and then make \(f_D\) differ from \(f_i\) by making it different for some input \(x_i\), i.e. make \(f_D(x_i) \neq f_i(x_i)\). For this strategy to work, all we need is that for each \(f_i\), we can pick a different \(x_i\) so that \(f_D\) can be constructed in a consistent way. That is, all we need is that \(|X| \geq k = |\mathcal{F}|\).

We can visualize this strategy with a table where the rows are labeled \(f_1, f_2, \ldots, f_k\), the columns are labeled \(x_1, x_2, \ldots, x_k\), and the entry corresponding to row \(f_i\) and column \(x_j\) is \(f_i(x_j)\). Then the construction of \(f_D\) involves going down the diagonal of the table and switching its value. For example, if \(Y = \{0,1\}\), we might have the following table.


This strategy naturally generalizes to infinite sets, leading to the following lemma.

Lemma (Diagonalization)

Let \(X\) be any set and let \(\mathcal{F}\) be a set of functions \(f: X \to Y\) where \(|Y| \geq 2\). If \(|X| \geq |\mathcal{F}|\), we can construct a function \(f_D : X \to Y\) such that \(f_D \not \in \mathcal{F}\).


The main idea is the following. For each \(f \in \mathcal{F}\), pick a unique input \(x \in X\) and define \(f_D(x)\) in a way such that it is different from \(f(x)\). Here, it is important that we pick a unique \(x\) for each \(f \in \mathcal{F}\) so that \(f_D\) can be defined consistently. The ability to pick a unique \(x\) for each \(f \in \mathcal{F}\) is equivalent to \(|X| \geq |\mathcal{F}|.\)

A bit more formally, since \(|X| \geq |\mathcal{F}|,\) there is an injection \(\phi : \mathcal{F}\to X.\) Let \(x_f = \phi(f).\) So \(f \neq f'\) implies \(x_f \neq x_{f'}.\) Define \(f_D\) such that for all \(f \in \mathcal{F},\) \(f_D(x_f) \neq f(x_f)\) (this is where the assumption \(|Y| \geq 2\) is used). This ensures that \(f_D\) is different from every \(f \in \mathcal{F},\) and therefore \(f_D \not \in \mathcal{F}.\) (Note that our description of \(f_D\) leaves it under-specified, but see the comment below.)

When we apply the above lemma to construct an explicit \(f_D \not\in \mathcal{F}\), we call this diagonalizing against the set \(\mathcal{F}.\) And we call \(f_D\) a diagonal element. Typically, there are several choices for the definition of \(f_D\):

  • Different injections \(\phi : \mathcal{F}\to X\) can lead to different diagonal elements.

  • If \(|Y| > 2\), we have more than one choice on what value we assign to \(f_D(x_f)\) that makes \(f_D(x_f) \neq f(x_f)\) (here \(x_f\) denotes \(\phi(f)\)).

  • If there are elements \(x \in X\) not in the range of \(\phi\), then we can define \(f_D(x)\) any way we like.

One of key features of the diagonalization technique is that it is a constructive argument: It does not just establish the existence of a certain object, but it gives us a way to construct it.

A direct corollary of the Diagonalization Lemma is the following. When \(\mathcal{F}\) is the set of all functions \(f: X \to Y\), we know we cannot diagonalize against \(\mathcal{F}\) and construct \(f_D\). So then we must have \(|X| < |\mathcal{F}|\). And if \(X\) is an infinite set, this implies \(\mathcal{F}\) is uncountable. For example, the set of all functions \(f: \mathbb N\to \{0,1\}\) is uncountable.

We state the corollary below for the special case of \(Y = \{0,1\}\), which is known as Cantor’s theorem.

Theorem (Cantor’s theorem)

Let \(\mathbf{F}(X)\) denote the set of all functions \(f: X \to \{0,1\}\). Then for any \(X\), \[|X| < |\mathbf{F}(X)|.\]

Corollary (Uncountable sets via diagonalization)

Let \(X\) be an infinite set. Then the set \(\mathbf{F}(X)\) is uncountable.

Limits of Computation

Are all decision problems \(f: \{0,1\}^* \to \{0,1\}\) decidable? We can view this question as a comparison of two sets. Let \(\mathcal{F}\) denote the set of all decidable decision problems. And let \(\mathbf{F}(\{0,1\}^*)\) be the set of all decision problems. Obviously \(\mathcal{F}\subseteq \mathbf{F}(\{0,1\}^*)\), but is the inclusion strict? And if it is, can we find an explicit undecidable decision problem?

To answer the first question, note that the corollary to Cantor’s theorem implies that \(\mathbf{F}(\{0,1\}^*)\) is uncountable. So most decision problems cannot be finitely represented/encoded. On the other hand, \(\mathcal{F}\) is countable because every decidable decision problem can be finitely described using a TM that solves/decides it. Since \(\mathbf{F}(\{0,1\}^*)\) is uncountable and \(\mathcal{F}\) is countable, we can conclude that almost all decision problems are undecidable.

Can we identify an explicit undecidable decision problem? Given our discussion in the previous section, obviously we should try to diagonalize against the set of all decidable decision problems \(\mathcal{F}\). The condition to apply diagonalization is \(|\mathcal{F}| \leq |\{0,1\}^*|\), and indeed, this is true since \(\mathcal{F}\) is encodable. So we can construct an explicit undecidable decision problem. Let’s see which explicit undecidable decision problem diagonalization spits out for us.

Instead of diagonalizing against the set of all decidable decision problems, it will be a bit more convenient to diagonalize against the set \(\mathcal{T}\) consisting of all TMs. We can do this because:

  • We can view a TM as a mapping from \(X = \{0,1\}^*\) to \(Y = \{0,1,\infty\}\).

  • Since the set of all TMs is encodable, we have \(|\mathcal{T}| \leq |X|\), and therefore the condition to apply diagonalization is satisfied.

To explicitly define the diagonal function \(f_D\), we pick an injection from \(\mathcal{T}\) to \(X = \{0,1\}^*\). The most obvious one is \(M \mapsto \left\langle M \right\rangle\). Then, we make \(f_D\) differ from a TM \(M\) by making sure \(f_D(\left\langle M \right\rangle) \neq M(\left\langle M \right\rangle)\).


To be more explicit, \[f_D(x) = \begin{cases} 1 & \text{if $x = \left\langle M \right\rangle$ for some TM $M$ and $M(\left\langle M \right\rangle) \in \{0, \infty\}$\;} \\ 0 & \text{otherwise.} \end{cases}\]

Once again, by construction, any TM \(M\) differs from \(f_D\) on input \(\left\langle M \right\rangle\), i.e. \[\text{for all TMs $M$, } \quad f_D(\left\langle M \right\rangle) \neq M(\left\langle M \right\rangle).\] Therefore, \(f_D\) is not decidable by any TM.

The decision problem \(f_D\) is known as Not-Self-Accepts, and from now on we’ll denote it by \(f_\mathrm{NSA}\). The inputs \(x\) such that \(f_\mathrm{NSA}(x) = 1\) are the encodings of TMs \(M\) such that \(M(\left\langle M \right\rangle)\) does not return True (i.e. \(M\) does not self-accept), so \(M(\left\langle M \right\rangle)\) either returns False or loops forever.

Theorem (1st Undecidability Theorem)

The decision problem \(f_\mathrm{NSA}\) is undecidable.

Remark (Diagonalization and self-referencing)

Often the diagonalization technique is referred to as a self-referencing technique. However, this probably suggests more meaning than is deserved. For instance, even though in the proof of the above theorem we chose the injection \(M \mapsto \left\langle M \right\rangle\), we certainly didn’t have to. The injection can be quite arbitrary and the string that \(M\) maps to can be unrelated to \(M\). We therefore prefer to think of self-reference as one instantiation of a more general technique.

Now that we have the 1st undecidable problem, we can use reductions to show that other problems are undecidable. To illustrate the concept, let’s consider the obvious attempt in trying to write an algorithm deciding \(f_\mathrm{NSA}\). (Recall that \(U\) denotes a universal TM.) \[\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; M_\mathrm{NSA}(\left\langle \text{TM } M \right\rangle):\\ &{\scriptstyle 1.}~~~~\texttt{return not $U(\left\langle M, \left\langle M \right\rangle \right\rangle)$}\\ \hline\hline\end{aligned}\] The problem with this algorithm is that if \(M(\left\langle M \right\rangle) = \infty\), then \(U(\left\langle M, \left\langle M \right\rangle \right\rangle)\) loops forever, and so \(M_\mathrm{NSA}(\left\langle M \right\rangle)\) loops forever and fails to return the correct output. On the other hand, if we knew for sure that \(M(\left\langle M \right\rangle)\) halts (so either accepts or rejects), then we could safely run \(U(\left\langle M, \left\langle M \right\rangle \right\rangle)\) and return the correct answer.

What we have implicitly argued above is that deciding \(f_\mathrm{NSA}\) reduces to deciding the halting decision problem. The halting problem is defined such that \(f_\mathrm{HALTS}(x) = 1\) if and only if \(x = \left\langle M,w \right\rangle,\) where the TM \(M\) halts on input \(w\). We argued above that if \(f_\mathrm{HALTS}\) is decidable (so there is a TM \(M_{\mathrm{HALTS}}\) deciding \(f_\mathrm{HALTS}\)), then we can decide \(f_\mathrm{NSA}\) as well, as follows. \[\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; M_\mathrm{NSA}(\left\langle \text{TM } M \right\rangle):\\ &{\scriptstyle 1.}~~~~\texttt{If $M_\mathrm{HALTS}(\left\langle M, \left\langle M \right\rangle \right\rangle)$ is False, return True}\\ &{\scriptstyle 2.}~~~~\texttt{Else return not $U(\left\langle M, \left\langle M \right\rangle \right\rangle)$}\\ \hline\hline\end{aligned}\] Since we know that \(f_\mathrm{NSA}\) is undecidable, this shows that \(f_\mathrm{HALTS}\) must be undecidable as well.

In general, the idea of a reduction is as follows. Let \(f\) and \(g\) be two decision problems. We say that deciding \(f\) reduces to deciding \(g\) (or simply, \(f\) reduces to \(g\)), if we are able to do the following: assume \(g\) is decidable (for the sake of argument), and then show that \(f\) is decidable by using the decider for \(g\) as a black-box subroutine (i.e. a helper function). Here the problems \(f\) and \(g\) may or may not be decidable to begin with. But if \(f\) reduces to \(g\) and \(g\) is decidable, then \(f\) is also decidable. Equivalently, if \(f\) reduces to \(g\) and \(f\) is undecidable, then \(g\) is also undecidable.

We can use reductions to expand the landscape of undecidable problems. Above we reduced \(f_\mathrm{NSA}\) to \(f_\mathrm{HALTS}\) to show \(f_\mathrm{HALTS}\) is undecidable. We could, for example, show that \(f_\mathrm{HALTS}\) reduces to another problem \(g\) to prove that \(g\) is undecidable. And indeed, \(f_\mathrm{HALTS}\) is a very popular choice for undecidability proofs like this.

What’s Next?

In the next post, we will prove Gödel’s incompleteness theorems. The main ingredient in the proof of the first incompleteness theorem will be the diagonalization argument that we used to show \(f_\mathrm{NSA}\) is undecidable. In particular, we’ll construct an explicit true statement that is unprovable. The second incompleteness theorem will be established via the concept of a reduction for provability. This is similar in flavor to showing \(f_\mathrm{HALTS}\) is undecidable using a reduction.

This representation is particularly intuitive for us since we humans use strings when communicating in written form. And this is one of the main advantages of mathematically representing data with strings.