One of the best methods we have for creating reliable knowledge and explanations is *mathematical modeling and reasoning*.

Given something we want to understand better in the real world, we first create a mathematical/abstract model that captures the essence of it. Once we have a model, we can reason about it logically and derive new knowledge from it. We then hope that this new knowledge and understanding can be applied back in the real world.

One field that beautifully exemplifies the power of the above approach is physics (e.g. think about the amazing success of the theory of general relativity and quantum field theory). Another great example is computer science. To create a deep and reliable understanding of computation, we model it mathematically (e.g. the TM model), and then study it rigorously.

Good Old Regular Mathematics, or \(\text{GORM}\) for short, refers to the right-hand side of the above diagram. It refers to the usual mathematical reasoning that we have been doing for thousands of years. We have been mostly confident in the tools of mathematical reasoning, and the new knowledge and understanding it produces. But things started to change in the 19th century, with the concept of infinity and infinite sets playing an important role in mathematical reasoning. Certain ideas involving infinity were not well-defined. There were serious disagreements about whether infinite sets could be mathematically analyzed similar to finite sets. And people realized that a naı̈ve definition of a set led to paradoxes. So it became clear that we needed to put \(\text{GORM}\) on a secure foundation. We needed to know exactly which reasoning principles were valid, and which were not, so that we could be certain about the knowledge that we created.

Given that we want to better understand this real-world human process that we call \(\text{GORM},\) it is natural to plug *it* into the diagram above as “Something of interest”. So we can create a mathematical model for \(\text{GORM}\) itself. We’ll call it FORM (short for formalization of mathematics).

This has a couple of important advantages.

First, FORM forces us to make precise all the elements of \(\text{GORM}.\) So we have to make precise what a mathematical statement is and what a proof is (which involves identifying our axioms and the deductions rules that we deem valid). With this formalization, we can hopefully agree on what constitutes a valid mathematical argument, but even if we can’t, we can pinpoint exactly where the disagreements are.

The second advantage is that FORM allows us to study \(\text{GORM}\) rigorously and discover interesting properties about it. In particular, we can study the power and limitations of mathematical reasoning, similar to how the formalization of computation allows us to study its power and limitations.

If the previous paragraph makes you a bit uneasy, that’s completely normal! There is a meta nature to it all. We are talking about creating a mathematical model for \(\text{GORM}\) (that we call FORM), and then proving properties of FORM (e.g. the incompleteness theorems) using \(\text{GORM}\). This means that the things we prove about FORM using \(\text{GORM}\) can be formalized in FORM (e.g. the incompleteness theorems can be formalized). There are no issues with any of this; we just need to be confident that the FORM we use is a correct model for \(\text{GORM}\).

In this section, we will identify the key components of FORM that we’ll need to study it rigorously.

We can think of \(\text{GORM}\) as the following two human processes.

\(\mathrm{Prover}\): Given a statement \(S\), come up with a proof \(P\) of \(S\) (if it exists).

\(\mathrm{Verifier}\): Given a statement \(S\) and an argument \(P\), verify whether \(P\) is a proof of \(S\).

The process of coming up with a proof can be messy, it can be inefficient, and we don’t necessarily expect to always produce a correct output. On the other hand, we agree that verifying a proof should be a mechanistic and efficient process. Given any statement \(S\) and any argument \(P\), we should always be able to figure out whether \(P\) is actually a proof of \(S\). So we agree that there should be an algorithm (a decider TM) that corresponds to the Verifier.

Let’s now make 2 observations. First, given that the Verifier denotes a decider TM, statements and proofs need to be encodable with finite-length strings. Second, given a Verifier, we can build a TM for the Prover as follows: try all possible arguments \(P\) one by one (e.g. in lexicographical order) and then using the Verifier, check if \(P\) is a proof of \(S\), and if it is, output \(P\). This is of course not an efficient Prover (and it may not even halt for all inputs), but it is a useful one to consider, especially because the efficiency of the Prover (or the Verifier) will not play any role in the results presented here.

Even though our main interest is formalizing all mathematical reasoning, it is useful to build a general framework for formalizing any *subset* of mathematical reasoning. For instance, we may want to come up with a formalization that captures mathematical reasoning for plane geometry. Or we may want to formalize basic arithmetic. Or we may want to formalize probability theory. For any area \(A\) of mathematics, let \(\text{GORM}_A\) denote Good Old Regular Mathematics restricted to \(A\). Without the subscript, \(\text{GORM}\) represents all mathematical reasoning.

The elements of \(\text{GORM}_A\) that we would like to formalize are actually captured by the \(\mathrm{Verifier}\), so we can just focus on it and make sure that we formalize all its pieces. At a high level, a *proof system* \(\text{PS}_A\) is a mathematical model/formalization of \(\text{GORM}_A\) with the following properties.

For every statement \(S\) in \(\text{GORM}_A\) with a truth value, there is a precise representation of \(S\) in \(\text{PS}_A\) as a finite-length string, denoted \(\left\langle S \right\rangle\). We let \({\neg S}\) denote the negation of \(S\).

For every argument \(P\) in \(\text{GORM}_A\), there is a precise representation of \(P\) in \(\text{PS}_A\) as a finite-length string, denoted \(\left\langle P \right\rangle\).

\(\text{PS}_A\) specifies a decider TM \(V\) (called a

*verifier*) such that \(V(\left\langle S,P \right\rangle)\) returns True if and only if \(P\) is a proof of statement \(S\) in \(\text{GORM}_A\).

A statement \(S\) is called *provable* if there exists a string \(w\) such that \(V(\left\langle S,w \right\rangle)\) returns True. A statement \(S\) is called *independent* if neither \(S\) nor \({\neg S}\) is provable.

As observed above, given the verifier \(V\), we can build a Prover as follows. \[\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; \mathrm{Prover}(\left\langle S \right\rangle):\\ &{\scriptstyle 1.}~~~~\texttt{For $k = 1, 2, 3, \ldots$}\\ &{\scriptstyle 2.}~~~~~~~~\texttt{For every string $w$ of length $k$:}\\ &{\scriptstyle 3.}~~~~~~~~~~~~\texttt{If $V(\left\langle S, w \right\rangle)$: return $w$.}\\ \hline\hline\end{aligned}\] Note that if \(S\) has a proof, then \(\text{Prover}(\left\langle S \right\rangle)\) will eventually find that proof and return it.

The computational task that the \(\mathrm{Prover}\) tries to solve has a decision problem version: \(f_\mathrm{PROVABLE}(x)\) returns True if and only if \(x = \left\langle S \right\rangle\), where \(S\) is a provable statement. We have the following algorithm for \(f_\mathrm{PROVABLE}.\) \[\begin{aligned}
\hline\hline\\[-12pt]
&\textbf{def}\;\; \mathrm{Resolver}(\left\langle S \right\rangle):\\
&{\scriptstyle 1.}~~~~\texttt{For $k = 1, 2, 3, \ldots$}\\
&{\scriptstyle 2.}~~~~~~~~\texttt{For every string $w$ of length $k$:}\\
&{\scriptstyle 3.}~~~~~~~~~~~~\texttt{If $V(\left\langle S, w \right\rangle)$: return True.}\\
&{\scriptstyle 4.}~~~~~~~~~~~~\texttt{If $V(\left\langle {\neg S}, w \right\rangle)$: return False.}\\
\hline\hline\end{aligned}\] At this point we do not know whether \(\mathrm{Resolver}\) is a TM that *decides* \(f_\mathrm{PROVABLE},\) but we’ll come back to this later.

Is there an agreed upon proof system \(\text{PS}\) that faithfully captures \(\text{GORM}\)? Yes, there are indeed several choices for \(\text{PS}\). A popular one is the Zermelo–Fraenkel-Choice (ZFC) axiomatic system for set theory. This system specifies a set of axioms as well as some basic deduction rules that we can use to prove statements. We will not define ZFC since we won’t need its formal definition. Instead, we will appeal to the following thesis.

Recall that the Church-Turing Thesis is a \(\text{GORA}\)-to-TM thesis (where \(\text{GORA}\) is short for Good Old Regular Algorithms). It states that every algorithm “compiles down” to a TM. \(\text{GORM}\)-to-ZFC Thesis states that every mathematics proof compiles down to a proof in ZFC.

When we are proving properties of algorithms, we usually don’t directly work with TMs, but rather appeal to the Church-Turing Thesis: we work with high-level algorithms with the understanding that they can be translated to TMs. Similarly, when we prove properties of mathematical reasoning, we won’t directly work with ZFC, but appeal to the \(\text{GORM}\)-to-ZFC Thesis: we will use high-level mathematical statements and proofs with the understanding that they can be translated into ZFC statements and proofs.

Now that we know the basic defining features of a proof system, let’s talk about the properties we would hope our proof system to have, and see whether we can prove that our proof system has those properties.

Let \(\text{PS}\) be a proof system.

\(\text{PS}\) is

*consistent*means that for all statements \(S\), if \(S\) is provable, then \({\neg S}\) is not provable. Or equivalently, for all \(S\), at most one of \(S\) and \({\neg S}\) is provable.\(\text{PS}\) is

*sound*means that for all statements \(S\), if \(S\) is provable, then \(S\) is true. This is equivalent to saying that \(\text{PS}\) does not prove false statements.\(\text{PS}\) is

*complete*means that for every statement \(S\), either \(S\) or \({\neg S}\) is provable (i.e. at least one of \(S\) and \({\neg S}\) is provable). So \(\text{PS}\) is*incomplete*if there exists an independent statement (i.e. a statement \(S\) such that neither \(S\) nor \({\neg S}\) is provable).

Some observations about these definitions are in order.

If \(\text{PS}\) is

**sound**, then it must be**consistent**. So soundness is a stronger requirement.If \(\text{PS}\) is both

**consistent**and**complete**, then for all statements \(S\), exactly one of \(S\) and \({\neg S}\) is provable.If \(\text{PS}\) is both

**sound**and**complete**, then in addition to the observation in the previous point, provable statements would be true. And therefore truth would correspond exactly to provability!

Let’s now observe what these things imply about the \(\mathrm{Resolver}\) TM that we defined earlier.

If \(\text{PS}\) is

**sound**, then whenever \(\mathrm{Resolver}\) halts (returns True or False), it gives the correct answer. In other words, whenever \(\mathrm{Resolver}(\left\langle S \right\rangle)\) returns True or False, it*resolves*\(S\) correctly. But soundness by itself does not imply that \(\mathrm{Resolver}\) is a decider TM (that is, it may potentially loop forever for some inputs).If \(\text{PS}\) is

**complete**, then \(\mathrm{Resolver}\) is a decider TM (i.e. it halts and returns True or False on all inputs). Or equivalently, if \(\mathrm{Resolver}\) does not halt on some input \(\left\langle S \right\rangle\), then neither \(S\) nor \({\neg S}\) is provable, so \(\text{PS}\) would be incomplete.If \(\text{PS}\) is both

**sound**and**complete**, \(\mathrm{Resolver}\) would be a decider TM that correctly resolves every statement. In other words, for all statements \(S\), if \(S\) is true, then \(\mathrm{Resolver}(\left\langle S \right\rangle)\) returns True, and if \(S\) is false, then \(\mathrm{Resolver}(\left\langle S \right\rangle)\) returns False.

Our ideal is to have a sound and complete proof system, so that truth can be modeled with provability, and \(\mathrm{Resolver}\) is a decider for truth. The following quote of David Hilbert nicely expresses his belief in this kind of dream.

For us there is no ignorabimus, and in my opinion none whatever in natural science. In opposition to the foolish ignorabimus our slogan shall be Wir müssen wissen — wir werden wissen (“We must know — we will know.”)

Hilbert challenged mathematicians to come up with a proof system realizing the dream. This is known as *Hilbert’s program*.

Come up with a proof system \(\text{PS}\) that formalizes \(\text{GORM}\), and furthermore, prove that \(\text{PS}\) is consistent and that it is complete.

Why don’t we ask for a proof of soundness? This is a bit of a subtle point, but soundness is not something we can prove. For example, in an axiomatic proof system like ZFC, proving that it is sound is equivalent to proving the axioms. However, the truth of the axioms is not something we can prove, but is something we have to assume. This is necessary in order to avoid an infinite regress. We build everything on top of a base set of assumed truths.

Unfortunately (or perhaps “fortunately” depending on your views), it turns out Hilbert’s dream is impossible.

In this section we will prove Gödel’s incompleteness theorems. These theorems apply to any proof system \(\text{PS}\) rich enough to formalize basic arguments about TMs. In particular, \(\text{PS}\) does not have to be as rich as ZFC (which captures all mathematical reasoning).

Here is the outline for this section:

The first result/observation will be to prove that if ZFC is sound, then there must be an independent statement (so ZFC is incomplete).

The above result is non-constructive, so we’ll next come up with an

*explicit*independent statement \(S\). So we’ll prove, if ZFC is sound, then the statement \(S\) is independent.Next, we will prove the “consistency version” of the above result: we will show that if ZFC is consistent, the same statement \(S\) is independent. This is a strengthening of the above result because the consistency assumption is weaker than the soundness assumption.

The first three bullet points here are different versions of the 1st Incompleteness Theorem, but in the literature, the 1st Incompleteness Theorem often refers to this consistency version.

The weakening of the assumption from soundness to consistency will allow us to prove the 2nd Incompleteness Theorem, which states that if ZFC is consistent, then \(S = \textrm{``}\text{ZFC is consistent}\textrm{''}\) is not provable in ZFC. So if ZFC is consistent, it cannot prove its own consistency.

Notice that this completely shatters Hilbert’s program. We cannot have a formalization of \(\text{GORM}\) that is complete, and furthermore, we cannot prove the consistency of our formalization.

Before we dive into the proofs, it is worth mentioning a more fundamental limitation of humans doing mathematics. You might call this the 0th Incompleteness Theorem. Like any computing device, we are limited to working with finite-length strings. So we can only reason about a tiny sliver of mathematics corresponding to things with a finite description. For instance, most decision problems do not have a finite description, so we cannot even talk/reason about most individual decision problems (this is not in conflict with the fact that we can talk/reason about the set of all decision problems, which does have a finite description). Gödel’s incompleteness theorems point to the limitations of our reasoning *within* this tiny sliver of finitely describable mathematics.

The 1st Incompleteness Theorem, which is philosophically the most important/influential one, turns out to be a straightforward corollary of the 1st Undecidability Theorem that we proved in the previous post. In particular, we can see that if Hilbert’s dream was realizable (i.e. if we had a sound and complete proof system), truth would correspond to provability, and the TM \(\mathrm{Resolver}\), for any statement \(S\), would correctly output its truth value. Such a computable truth oracle is too good to be true. It would allow us to decide undecidable problems. For example, we can decide \(f_\mathrm{NSA}\) as follows. \[\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; M_\mathrm{NSA}(\left\langle M \right\rangle):\\ &{\scriptstyle 1.}~~~~\texttt{Return $\mathrm{Resolver}(\left\langle \textrm{``}M(\left\langle M \right\rangle) \text{ does not accept}\textrm{''} \right\rangle)$.}\\ \hline\hline\end{aligned}\]

The careful reader will notice that within \(M_\mathrm{NSA}\), we need to computationally create the string \(\left\langle \textrm{``}M(\left\langle M \right\rangle) \text{ does not accept}\textrm{''} \right\rangle\) from the string \(\left\langle M \right\rangle\). This is not hard since the former is equivalent to \(\left\langle \textrm{``}U(\left\langle M, \left\langle M \right\rangle \right\rangle) \text{ does not accept}\textrm{''} \right\rangle\), where \(U\) is a universal TM (which has a representation in FORM that we can hard-code), and the input to \(U\) is \(\left\langle M, \left\langle M \right\rangle \right\rangle\), which we can easily create from \(\left\langle M \right\rangle\).

There is nothing particularly special about \(f_\mathrm{NSA}\), and we could have worked with \(f_\mathrm{HALTS}\) as well (or any other explicit undecidable decision problem). If we have a sound and complete proof system, the following would be a decider TM for \(f_\mathrm{HALTS}\). \[\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; M_\mathrm{HALTS}(\left\langle M, x \right\rangle):\\ &{\scriptstyle 1.}~~~~\texttt{Return $\mathrm{Resolver}(\left\langle \textrm{``}M(x) \text{ halts}\textrm{''} \right\rangle)$.}\\ \hline\hline\end{aligned}\] We have thus proven the following version of the incompleteness theorem.

ZFC cannot be both sound and complete. In other words, if ZFC is sound, then it must be incomplete, i.e. \[\text{ZFC is sound} \implies \text{ZFC is incomplete}.\]

This is the most significant result in this post, highlighting the inherent limitation of mathematical reasoning. That being said, we can build on it to say other interesting things. The above theorem implies that if ZFC is sound, then there exists some statement that is independent of ZFC. But it does not give us an explicit statement that is independent. Our task now is to identify an explicit independent statement (assuming ZFC is sound).

For a TM \(M\), let \[S_M = \textrm{``}M \text{ does not self-accept}\textrm{''}.\] Using this notation, we can describe the TM \(M_\mathrm{NSA}\) as follows. \[\begin{aligned} \hline\hline\\[-12pt] &\textbf{def}\;\; M_\mathrm{NSA}(\left\langle M \right\rangle):\\ &{\scriptstyle 1.}~~~~\texttt{Return $\mathrm{Resolver}(\left\langle S_M \right\rangle)$.}\\ \hline\hline\end{aligned}\] This describes a valid TM, but it cannot be a correct decider for \(f_\mathrm{NSA}\) since \(f_\mathrm{NSA}\) is undecidable. This means that for some input, \(M_\mathrm{NSA}\) fails to give the correct answer. That is, for some TM, let’s call it \(M^*\), \[M_\mathrm{NSA}(\left\langle M^* \right\rangle) \neq f_\mathrm{NSA}(\left\langle M^* \right\rangle).\] When a TM fails to give the correct answer on a particular input, it either loops forever for that input, or it halts (returns True or False), but it returns the wrong answer. Since \(M_\mathrm{NSA}(\left\langle M^* \right\rangle)\) simply returns \(\mathrm{Resolver}(\left\langle S_{M^*} \right\rangle)\), we have the following two possibilities:

**Case (i)**: \(M_\mathrm{NSA}(\left\langle M^* \right\rangle)\) loops forever.In this case \(\mathrm{Resolver}(\left\langle S_{M^*} \right\rangle)\) loops forever, which means it cannot find a proof of \(S_{M^*}\) nor its negation. Therefore \(S_{M^*}\) is independent.

**Case (ii)**: \(M_\mathrm{NSA}(\left\langle M^* \right\rangle)\) halts.In this case \(\mathrm{Resolver}(\left\langle S_{M^*} \right\rangle)\) halts and returns True or False, but it resolves \(S_{M^*}\) incorrectly.

We are assuming ZFC is sound, and as we observed before, this means that for all statements \(S\), if \(\mathrm{Resolver}(\left\langle S \right\rangle)\) returns True or False, then it resolves \(S\) correctly. Therefore case (ii) cannot happen. So assuming ZFC is sound, we must be in case (i), and for some TM \(M^*\), the statement \(S_{M^*}\) is independent!

Now the question is whether we can identify such a TM \(M^*\). For which \(M^*\) does \(M_\mathrm{NSA}(\left\langle M^* \right\rangle)\) loop forever? Or equivalently, for which \(M^*\), do we have \(M_\mathrm{NSA}(\left\langle M^* \right\rangle) \neq f_\mathrm{NSA}(\left\langle M^* \right\rangle)\)? Recall how we constructed/defined \(f_\mathrm{NSA}\) using diagonalization: \[\text{For every TM $M$, } \quad f_\mathrm{NSA}(\left\langle M \right\rangle) \neq M(\left\langle M \right\rangle).\] And this is also true for \(M = M_\mathrm{NSA}\).

So \(M_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle) \neq f_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle)\). And therefore for \(M^* = M_\mathrm{NSA}\), the statement \(S_{M^*}\) is independent!

Notice that in the picture above, we have filled the entry corresponding to \(M_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle)\) with \(\infty\) because we know that when \(M_\mathrm{NSA}\) disagrees with \(f_\mathrm{NSA}\), it loops forever (case (i) above). So \(M_\mathrm{NSA}\) indeed does not self-accept, and \(S_{M_\mathrm{NSA}}\) is a true statement that is unprovable.

If ZFC is sound, then the statement \(S_{M_\mathrm{NSA}}\) is true and is independent of ZFC. I.e., \[\begin{aligned} \text{ZFC is sound} & \implies S_{M_\mathrm{NSA}} \text{ is true}, \\ \text{ZFC is sound} & \implies S_{M_\mathrm{NSA}} \text{ is independent}.\end{aligned}\]

We will now relax the assumption “ZFC is sound” to “ZFC is consistent” and show that the above theorem still holds. That is, we’ll prove:

If ZFC is consistent, then the statement \(S_{M_\mathrm{NSA}}\) is true and is independent of ZFC. I.e., \[\begin{aligned} \text{ZFC is consistent} & \implies S_{M_\mathrm{NSA}} \text{ is true}, \\ \text{ZFC is consistent} & \implies S_{M_\mathrm{NSA}} \text{ is independent}.\end{aligned}\]

The key to proving the soundness version of the incompleteness theorem was to assume ZFC is sound, and then rule out case (ii) above. Recall that in case (ii), \(M_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle)\) halts and returns True or False (so \(\mathrm{Resolver}(\left\langle S_{M_\mathrm{NSA}} \right\rangle)\) halts and returns True or False), but \(\mathrm{Resolver}\) resolves \(S_{M_\mathrm{NSA}}\) incorrectly.

In order to rule out case (ii), the soundness assumption is an overkill. We don’t need to assume that ZFC is sound *for all* statements \(S\). It suffices to do the following:

Assume \(M_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle)\) halts.

Show that ZFC must be sound with respect to the statement \(S_{M_\mathrm{NSA}}\). That is, show that \(\mathrm{Resolver}(\left\langle S_{M_\mathrm{NSA}} \right\rangle)\) resolves \(S_{M_\mathrm{NSA}}\) correctly.

Even though we cannot assume ZFC is sound, we can argue that it must be sound with respect to certain statements, and in particular, with respect to \(S_{M_\mathrm{NSA}}\). The key is to combine the assumption “ZFC is consistent” with \(\text{GORM}\)-to-ZFC thesis. Here is the high-level overview of how to combine these to argue that ZFC must be sound with respect some statement \(S\): Certain true statements \(S\) are provable in \(\text{GORM}.\) By \(\text{GORM}\)-to-ZFC thesis, \(S\) is provable (in ZFC). The assumption of consistency then allows us to say that \({\neg S}\) is not provable. Therefore, \(\mathrm{Resolver}(\left\langle S \right\rangle)\) correctly resolves \(S\) and returns True (so ZFC is sound with respect to \(S\)).

Let’s now apply this strategy to carry out the two bullet points above. We start by assuming \(M_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle)\) halts, and without loss of generality, suppose \(M_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle)\) returns False. Then in \(\text{GORM},\) we can prove \(M_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle)\) returns False by providing the execution trace of the TM. So in \(\text{GORM},\) we can prove \(S_{M_\mathrm{NSA}}.\) By \(\text{GORM}\)-to-ZFC thesis, \(S_{M_\mathrm{NSA}}\) is provable (in ZFC). By the assumption of consistency, \({\neg S_{M_\mathrm{NSA}}}\) is not provable. Therefore, \(\mathrm{Resolver}(\left\langle S_{M_\mathrm{NSA}} \right\rangle)\) correctly resolves \(S_{M_\mathrm{NSA}}\) and returns True.

To lay out these implications more explicitly: \[\begin{aligned} & M_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle) \text{ returns False} & \\ & \implies \; \textrm{``}M_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle) \text{ returns False'' is provable in GORM} & \text{(execution trace is a proof)} \\ & \implies \; S_{M_\mathrm{NSA}} \text{ is provable in GORM} & \\ & \implies \; S_{M_\mathrm{NSA}} \text{ is provable (in ZFC)} & \text{ (by GORM-to-ZFC thesis)}\\ & \implies \; {\neg S_{M_\mathrm{NSA}}} \text{ is not provable (in ZFC)} & \text{ (by consistency)} \\ & \implies \; \mathrm{Resolver}(\left\langle S_{M_\mathrm{NSA}} \right\rangle) \text{ returns True}. &\end{aligned}\] The argument is analogous when \(M_\mathrm{NSA}(\left\langle M_\mathrm{NSA} \right\rangle)\) returns True. Therefore, we have successfully arrived at the desired conclusion: assuming ZFC is consistent, case (ii) above cannot happen, so we must be in case (i), and \(S_{M_\mathrm{NSA}}\) is independent!

Recall that in the previous post, we used diagonalization to construct the first undecidable decision problem, namely \(f_\mathrm{NSA}\). Then, we saw that using the idea of a *reduction*, we can show that other problems are undecidable, like \(f_\mathrm{HALTS}\). In particular, we showed that deciding \(f_\mathrm{NSA}\) *reduces to* deciding \(f_\mathrm{HALTS}\). And since \(f_\mathrm{NSA}\) is undecidable, this implies \(f_\mathrm{HALTS}\) must be undecidable. Reductions allow us to expand the landscape of undecidable problems.

In this post, we have shown that \(S_{M_\mathrm{NSA}}\) is independent. We can then hope to show that other statements are independent (i.e. expand the landscape of independent statements) using some notion of a reduction for provability.

We’ll say proving statement \(S\) *reduces to* proving statement \(T\) (or simply \(S\) *reduces to* \(T\)) if the statement \(\textrm{``}T \implies S\textrm{''}\) is provable.

Notice that if \(S\) reduces to \(T\), and \(T\) is provable, then \(S\) is provable (combining the proof of \(\textrm{``}T \implies S\textrm{''}\) with the proof of \(T\), we get a proof of \(S\)). The contrapositive of this is that if \(S\) reduces \(T\), and \(S\) is *un*provable, then \(T\) must be unprovable. So if we start with an unprovable statement \(S\), and we can reduce \(S\) to \(T\) (i.e., show \(\textrm{``}T \implies S\textrm{''}\) is provable), then we can conclude \(T\) is unprovable.

We already know \(S_{M_\mathrm{NSA}}\) is unprovable (assuming ZFC is consistent). So can we reduce \(S_{M_\mathrm{NSA}}\) to another statement \(T\) to conclude \(T\) is also unprovable? Or in other words, can we show \(\textrm{``}T \implies S_{M_\mathrm{NSA}}\textrm{''}\) is provable for some \(T\)?

In this post, we have in fact \(\text{GORM}\)-proved \(\textrm{``}\text{ZFC is consistent} \implies S_{M_\mathrm{NSA}}\textrm{''}\). And by \(\text{GORM}\)-to-ZFC thesis, we know \(\textrm{``}\text{ZFC is consistent} \implies S_{M_\mathrm{NSA}}\textrm{''}\) is provable in ZFC. Therefore, \(\textrm{``}\text{ZFC is consistent}\textrm{''}\) is unprovable.

If ZFC is consistent, then the statement “ZFC is consistent” is not provable in ZFC.

In the first post, we saw the limits of computation as a direct corollary of diagonalization. And in this post, we saw the limits of mathematical reasoning as a direct corollary of the limits of computation, because mathematical reasoning is computation.

If we inspect the diagonalization argument to construct an explicit undecidable decision problem, we can see that we never really made use of the fact that TMs represent a *computational* model. The crucial part of the argument was that every TM is a finite object, i.e., the set of TMs is encodable. So we only needed that the TM model is a *finite-description* model for decision problems. This means that for other finite-description models (e.g. oracle TM model, or even ZFC itself), we can carry out the same argument. Even if mathematical reasoning was not a computation, but still had the finiteness restriction (statements and proofs have finite encodings, and the verifier can be explicitly defined as a function), we could argue about its limits.

As a fun exercise, we encourage you to think about what would happen if we diagonalized against the set of all decision problems that are finitely describable in ZFC. What conclusions can we reach from it?

It is well-known that the soundness version of the 1st Incompleteness Theorem can be elegantly expressed using the language of computation. The consistency version of the 1st Incompleteness Theorem as well as the 2nd Incompleteness Theorem can also be nicely expressed using the language of computation, however this fact is less well-known. See Scott Aaronson’s blog post here. And Amit Sahai’s comments here and here. In our opinion, Ryan O’Donnell has the best exposition for these results. My presentation here is very much influenced by Ryan’s, but the perspective is different. I am very grateful to Ryan for many hours of discussions on the incompleteness theorem and the topics surrounding it. His thoughts and guidance were crucial in the creation of this exposition. And I have borrowed some of the terminology that he used in his presentation and the ones he suggested in our conversations.