# Four color problem, odd Goldbach conjecture, and the curse of computing

For over twenty-three hundred years, at least since the publication of Euclid’s Elements, the conjecture and proof of new theorems has been the sine qua non of mathematics. The method of proof is at “the heart of mathematics, the royal road to creating analytical tools and catalyzing growth” (Rav, 1999; pg 6). Proofs are not mere justifications for theorems; they are the foundations and vessels of mathematical knowledge. Contrary to popular conception, proofs are used for more than supporting new results. Proofs are the results, they are the carriers of mathematical methods, technical tricks, and cross-disciplinary connections.

Of course, at its most basic level, a proof convinces us of the validity of a given theorem. The dramatic decisiveness of proofs with respect to theorems is one of the key characteristics that set mathematics apart from other disciplines. A mathematical proof is unique in its ability to reveal invalid conclusions as faulty even to the author of that conclusion. Contrast this with Max Planck’s conception of progress in science:

A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.

Further, unlike in science, a mathematical conclusion is shown to be faulty by the proofs and derivations of other mathematicians, not by external observations. In fact, if we use Karl Popper’s falsification as the criterion for science then mathematics (as practiced) is clearly distinct. A typical statement, such as the recently proven odd (or, weak) Goldbach conjecture states:

Weak Goldbach conjecture

Every odd integer $n > 5$ is the sum of three primes.

Given any specific integer $n$, we can verify that it is the sum of three primes by looking at the finite list of primes less than $n$ and seeing if any three of them (with repetitions allowed) add up to $n$. In other words, we could set a computer program running to check odd numbers one at a time, and if we ever came to one that can’t be written as the sum of three primes then we would have falsified (in the proper Popper sense) our conjecture.

“What makes this unscientific?”, you ask, “the weak Goldbach conjecture is precise and falsifiable, Popper would call it science.” The distinction is in how math is practiced, if the weak Goldbach conjecture only had to pass scientific standards of truth then it would have been an accepted result 271 years ago when Christian Goldbach first doodled it in the margins of his letter to Leonhard Euler. We wouldn’t need top mathematicians like H.A. Helfgott proving it.

What the mathematical community needs is a clear argument that shows that the conjecture holds for all odd integers, not just the ones we bothered to test. A poor programmer might jump at this opportunity by (1) writing down a program that looks at the odd integers one at a time and halts only if finds a counterexample. He could then (2) write a second program that takes the code of any program and decides if it will halt or not; maybe if the task proves too difficult, he could offer \$1000 for it on GetACoder.com. Finally, (3) if the second program reports that the first program halts then there must be a counterexample and the conjecture is false, and if the second program reports that the first program doesn’t halt then the odd Goldbach conjecture is true.

Of course, you realized that the second step is the Halting problem that Alan Turing famously proved to be undecidable. For computer scientists, in fact, the saddest news in the resolution of the odd Goldbach conjecture is that they can no longer use it as an example when introducing the Halting problem in COMP-101. However, as theorists, we can suspend our disbelief and pretend that maybe we could solve the Halting problem, what would that mean for mathematics?

If we simply had a mechanical oracle like Rav’s PYTHIAGORA that can read any conjecture and definitely declare if it is valid or not and “if all our toil and sweat in proving theorems served only to ascertain whether they were true or not, then PYTHIAGORA would deliver us of useless labors and frustrations” (Rav, 1999; pg. 6). Such a road would not lead us towards a mathematical paradise and Rav shows that if PYTHIAGORA existed then it “would have dealt a death blow to mathematics, for we would cease having ideas and candidates for conjectures” (Rav, 1999; pg. 6). A proof does more than verify a conjecture, it (often) establishes connections between fields, or introduces new methods and questions that are useful beyond the proven theorem. Most importantly, a proof is not a blackbox, it is a narrative to be consumed by the mathematician. What the mathematical community wants is the argument for the odd Goldbach conjecture to be beautiful and insightful, to build understanding.

Although the universal PYTHIAGORA of our imagination is impossible, computer assisted proofs run the risk of moving us closer to this dystopia. Any discussion of computers in mathematics has to open with the four color problem; a problem that began its life in 1852 as a conjecture in graph theory: given any separation of a plane into contiguous regions, are four colors enough to color the regions of the map so that no two regions that share a common boundary (that is not a corner, where corners are the points shared by three or more regions) have the same color? In 1977, Kenneth Appel, Wolfgang Haken, and John Koch published a computer-assisted proof and birthed the new four color problem, a philosophical question about the nature of proof and the role of computers in mathematics. Thomas Tymoczko addressed the new problem in his 1979 paper: “The four-color problem and its philosophical significance”.

Tymoczko (1979) characterizes proofs based on the anthropology, epistemology and logic of mathematics. Proofs are (a) convincing, (b) surveyable, and (c) formalizable. Surveyability plays a key role in Tymoczko’s analysis and he defines a proof as surveyable if it can be looked over and verified by a rational agent under realistic constraints. Without surveyability, a proof is not a priori.

In the four-color theorem (4CT), a key lemma was obtained by a computer. Appel and Haken provided an inductive proof for the 4CT, which relies on considering the reducibility of 1834 configurations. Note that unlike the poor programmer’s approach to the odd Goldbach conjecture, real math is involved in first reducing the infinite number of possible partitions of the plane into a finite number of restricting configurations. Establishing the reducibility of each configuration is tedious and long, but algorithmic. Koch provided the algorithm for verifying the reducibility of the configurations and the verification by computer serves as the lemma which completes the proof by induction.

Yesterday’s proof of the odd Goldbach conjecture by Harald Andrés Helfgott (2012,2013) follows a similar pattern of reducing an infinite statement to a proof with a large and finte number of casses to verify by computer. Helfgott applies the circle method that was developed by Hardy and Ramanujan in 1916. The method usually produces statements of the form ‘for sufficiently large $n$ (i.e. $n > n_0$) such-and-such property holds’. Helfgott refined the method to have ‘such-and-such property’ mean ‘is a sum of 3 primes’, and tightened $n_0$ to be about $10^{30}$. This application of circle method was the math of the proof, and a computer search was used to manually establishing a lemma for $n < 10^{30}$.

The brute-force proof of this computer lemma is so long that no mathematician has, or even can, survey the proof; thus, Tymoczko (1979) argues, mathematicians must trust the accuracy of their computers – an experiment! If mathematicians are to accept the 4CT they must acknowledge the lemma as a computer experiment, much like ones in natural sciences. For Tymoczko (1979), this distinction for computer assisted proofs makes them a posteriori and leads down a road of quasi-empiricism.

From my experience as a programmer and the difficulty of ensuring that code is accurate, I agree that assistance from a computer can sometimes feel like an experiment. However, Gontheir (2009) formally shown that there are no programming errors in the proof of the 4CT. But that doesn’t make the proof surveyable in Tymoczko’s sense; at least not by a human, only by the rational agency of a computer. But this is not unique to computer assisted proofs, most modern mathematical proofs are so extensive that many of them are not surveyable from first principles; the proofs rely on previous theorems and results. For many modern proofs, a single mathematician has no chance of surveying the whole proof and all the previous results it is based upon (and the results those results are based on, etc.). Mathematicians rely on modularity; they take previously proven statements as true and seldom question their validity. So surveying a proof is simply looking at the novel material and taking previous results as given, therefore a proof is surveyable not if a single rational agent can survey it, but if a collection of agents can do it. With my definition of surveyability that accounts for the modularity of mathematics, surveying the 4CT becomes much more tractable. Even though one mathematician has no chance of surveying all 1834 configurations considered by the computer, she can survey a single configuration and together 1834 mathematicians can survey the 4CT. Thus, I cannot accept Tymoczko’s argument that computer assisted proofs are necessarily a posteriari, but I agree that something smells fishy.

For me, the issue is not general surveyability, but internalization. No mathematician fully understands the computational part of the proof, at least no more than a pointy-haired boss understands the task his engineers completed. Although some AI enthusiasts might argue that the computer understands its part of the proof, most mathematicians (or people in general) would be reluctant to admit computers as full participants in the “social, informal, intuitive, organic, human process” (De Millo, Lipton, & Perlisp, 1979; pg. 269) of mathematics. For De Millo, Lipton, & Perlisp (1979), PYTHIAGORA’s verification or the computer assisted part of a proof is simply meaningless; it does not contribute to the community’s understanding of mathematics. This is easiest to see in the odd Goldbach conjecture: what understanding do we gain from the $10^{30}$ odd numbers that Helfgott’s computer program checked? It teaches us no new mathematics, no new methods, and brings no extra understanding beyond a verification.

In an alternative world without computer proofs, this verification would be absent. On the one hand, this means that alternative Helfgott would only tighten but not resolve the conjecture. On the other hand, the problem would remain open and continue to keep researchers motivated to find completely analytic ways to resolve it. Of course, even in our real world, a few mathematicians will continue looking for a non-computer assisted proof of the weak Goldbach conjecture, just as they continue to do with the four color theorem. However, the social motivation will be lower and progress slower. This is the curse of computing: giving up understanding for an easy verification.

Of course, I don’t mean this to detract from Helfgott’s breakthrough, the real danger of the curse is not among mathematicians but among theorists and modelers. Instead of the mathematical reasoning and deep understanding encouraged by biologists, economists, and physicists of the past, theorists today are increasingly satisfied with computer simulations. In fields where the simulations can be directly linked to experiment (physics, chemistry, etc) this is largely alright. In biology, economics, and the social sciences, however, the link of simulation to experiment is weak or nonexistent. Models serve as heuristics to inform largely verbal theories. They are built to pump out surprising results but not to extend understanding. Ironically, the co-founder of the Santa Fe Institute (the home of agent-based modeling and complexology) Murray Gell-Mann best states the computing dilemma:

What is the point of studying a complex system that we don’t understand by making a computer model that we don’t understand?

My pessimistic answer is ‘it’s easy’, writing an ‘abstract’ simulation is much simpler than trying to pursue an analytic result. If the simulations were actually treated as heuristics and followed up by rigorous treatments then this ease would not be a problem. However, it seems that in some fields simulations are increasingly discouraging follow up work. Once a computational model hints at a previously paradoxical effect, it often spells doom for understanding. Follow up papers that extend beyond number crunching to show general results are faced by the wall of “this isn’t novel” and rejected. The curse of computing is settling for examples or easy verification at the expense of a slower but deeper understanding. As computer scientists start to rewrite their COMP-101 courses to remove the Goldbach conjecture as a motivation for the halting problem, I hope that in its place they will stress replacing heuristic models by rigorous understanding.

### References

Appel, K., & Haken, W. (1977). Every planar map is four colorable. Part I. Discharging. Illinois J. Math 21: 429–490.

Appel, K., Haken, W., & Koch, J. (1977). Every planar map is four colorable. Part II. Reducibility. Illinois J. Math 21: 491–567.

Helfgott, H.A. (2012). Minor arcs for Goldbach’s problem. arXiv preprint 1205.5252

Helfgott, H.A. (2013). Major arcs for Goldbach’s problem. arXiv preprint 1305.2897

De Millo, R. A., Lipton, R. J., & Perlis, A. J. (1979). Social processes and proofs of theorems and programs. Communications of the ACM, 22(5), 271-280.

Gonthier, G. (2008). The four colour theorem: Engineering of a formal proof. In Computer Mathematics . Springer Berlin Heidelberg. [preprint version]

Rav, Y. (1999). Why Do We Prove Theorems? Philosophia Mathematica, 7 (1), 5-41 DOI: 10.1093/philmat/7.1.5

Tymoczko, T. (1979). The four-color problem and its philosophical significance. The Journal of Philosophy, 76(2): 57-83.

From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

### 30 Responses to Four color problem, odd Goldbach conjecture, and the curse of computing

1. Jacob Scott says:

Great post Artem! Ugh, I can’t tag people. I’ll reply on G+ …

2. Jacob Scott says:

I do want to take sliiiight issue with the end – I think that some models of biology do more than illustrate verbal models, or at least are getting close to doing so. Further, ill link to a recent paper that takes a surprising result from a CA and does some clever analysis to extend it, and tell you about a failed (open?) attempt that I and a collaborator have made to do it again…

3. haig says:

I’m reminded of Danny Hillis’ attempts at evolving a 16 wire sorting network. He got his system to produce one with 61 comparators, with 60 being the smallest ever found previously (and possibly still?), however, he said even after analyzing the code he had no idea how it worked!

The algorithms aren’t doing anything magical, in principle we should be able to undertand what they’re doing. The problem comes from trying to take a top-down view of the end result of the evolution of a very complex system. It is why biology is so hard to crack, whereas physics has succeeded so phenomenally. Extending that analogy, we could use a bioinformatics approach to analyzing such complex programs. Or, even better, since we are present at the birth of such digital species, we could program in breadcrumbs to inform us of what the program is doing along the way. Why couldn’t one program automated proof systems (and other search/optimization algorithms) with constraints that limited the system to produce human readable outputs?

• The genetic algorithm and uninterpretable neural networks are my biggest pet-peeve. Although I don’t think this is what Hillis’ does, it triggered that memory for me and so I have to rant a bit. Genetic algorithms and neural nets are universal learners, and thus the fact that they can eventually fit any data you throw at them (or optimize any function you give them) is nothing impressive. Especially if the task you give them is close to trivial (not the case with Hillis, but the case with most computational psychology work with learning in neural nets). After the net learns, the scientists argues that this produces support for his pet-theory based on vague (and often blatantly incorrect) analogies of neural nets and human brains. If the researchers could properly interpret and decode the representations their networks build, or if they followed computational neuroscience and build legitimately realistic models then things would be alright. As is, however, it is the prototypical example of the curse of computing :(.

I agree with your distinction between top-down and bottom-up views and I think it contributes significantly to the difficulty of biology (and psychology) but not completely. Physicist had much more freedom and creativity in defining relevant macroscopic properties with which to ask questions and describe systems. This freedom allowed them (like mathematicians) to ask questions they could answer, and if they got stuck to adjust the questions further. However, biologists and psychologists tend to lock themselves into specific macroscopic properties that are not well defined (or even inspired) scientifically but instead come from folk-biology/folk-psychology. The most notorious example being consciousness, although some progress is being made to find more reasonable approaches/definitions. Physicists had Newton to break them from folk-physics, I don’t think biology and psychology have quite found their equivalent.

That being said, a bottom up approach doesn’t necessarily mean understanding. Studying agent-based models (just their dynamics properties, not their application to answer questions in other fields) is a typical example of where knowing the inner workings (i.e. bottom-up) doesn’t help at all. Often, the few interesting statements that can be made, are top-down ones based on computational complexity; the universal-learner property mentioned before being one such example.

I definitely agree that computers can be extremely useful for understanding. Unfortunately, the blog post was already too long, and I had to stop writing and give a bit of a misrepresentation of my appreciation of computer assistance; the article paints me as much more of anti-formalist than I really am. I believe that recent work like Gowers and Ganesalingam‘s human-like theorem prover are extremely useful and in the future will help computers or computer+researcher pairs to become extremely better integrated into the broader social fabric of mathematics.

4. E.L. Wisty says:

Reblogged this on Pink Iguana.

5. Nicely written. I agree that research that relies solely on simulation sometimes has a weak connection to real experiments, but that is also true for a lot of analytical results. I think more work needs to be done in reducing the complexity and isolating aspects of the real world systems (societies, economies, organisms etc.) that one want to understand through modelling, and this type of work requires scientists of both theoretical and experimental background.

• Thank you. I agree that both simulations and analytics often lack strong connections to experiment, this is a whole other bag-of-worms that I don’t want to open. I am mostly concerned with how they allow us to gain this very vague concept of “understanding”. I definitely think what we need to work on reducing complexity and isolating real world systems, and I think the best way to approach this is to allow the questions to come from the relevant science and not from folk-science or elsewhere (say policy makers). However, that is a whole other blog post that I hope to write in the near future!

6. valuevar says:

Just to be clear: the meaningful and difficult part of my proof consisted in reducing \$n_0\$ to \$10^{30}\$ (or, in newer versions, to \$10^{27}\$). A brute-force check of small or special cases is something that has often been done, even before computers.

• Thank you for stopping by.

I didn’t do a very good job of explaining your result and I apologize for that. I used it primarily as motivation to talk about a tangentially related issue: the use of simulations. I should probably have written more than my two sentences about the hardest part of your work:

Helfgott refined the method to have ‘such-and-such property’ mean ‘is a sum of 3 primes’, and tightened $n_0$ to be about $10^{30}$. This application of circle method was the math of the proof, and a computer search was used to manually establishing a lemma for $n < 10^{30}$.

Unfortunately, I know next to no number theory, and so I would not be able to explain it. I am also aware that brute-force checks have been done before (and discuss the famous one for 4CT), but my real question was about what we learn from brute force checks:

This is easiest to see in the odd Goldbach conjecture: what understanding do we gain from the $10^{30}$ odd numbers that Helfgott’s computer program checked? It teaches us no new mathematics, no new methods, and brings no extra understanding beyond a verification.

Your refinement of the circle method does give us new mathematics, and new understanding; however, the easier part of brute-force verification of the finite cases left over only works to ‘establish’ the theorem, without teaching us more than what the “meaning and difficult part” of your work taught us. I am not trying to down-play your proof, just point to that the knowledge is in the math part and not the computer verification part. I then use this as analogy (and shameless attempt to be relevant to current events) for thinking about the use of computer simulations in science.

Even if we look at pre-computer examples of large case analyses, these often reflect our inability to see or articulate a common pattern. If we take the simple minded view of math as “the study of patterns” then a case analysis is often an admission of defeat: “we have not come up with an elegant vocabulary to more succinctly describe the myriad of cases we have to now enumerate”.

In this post I had suggested that this sort of case-analysis (for you, the check of cases upto 10^30) is somehow ‘lesser’ than the more ‘enlightening’ mathematical part (for you, the reduction of the problem from an infinite statement to a finite one). However, in the two years since I’ve written this post, my views have softened and now I would just say these are different perspectives and useful for different goals.

Of course, if I did an injustice to your work and/or you want to comment on your work or my thoughts on case-analysis/simulations more publicly than this comment thread (which few beside the two of us will read) then I would be happy to invite you to contribute a guest post on this blog! Although the details of your proof would be a bit more sophisticated than the math that is usually discussed here.