Mathematical Turing test: Readable proofs from your computer
April 15, 2013 9 Comments
We have previously discussed the finicky task of defining intelligence, but surely being able to do math qualifies? Even if the importance of mathematics in science is questioned by people as notable as E.O. Wilson, surely nobody questions it as an intelligent activity? Mathematical reasoning is not necessary for intelligence, but surely it is sufficient?
Note that by mathematics, I don’t mean number crunching or carrying out a rote computation. I mean the bread and butter of what mathematicians do: proving theorems and solving general problems. As an example, consider the following theorem about metric spaces:
Let X be a complete metric space and let A be a closed subset of X. Then A is complete.
Can you prove this theorem? Would you call someone that can — intelligent? Take a moment to come up with a proof.
One of the beauties of mathematics, is that theorem proving is a creative and unique endeavour — different individuals can come up with very different proofs and present them in a variety of ways. As an example, consider three proofs of the above theorem:

Consider an arbitrary Cauchy sequence in . As is complete, has a limit in . Suppose . Because is closed, belongs to . We’ve proved that every Cauchy sequence in has a limit point in . So is complete.

Let be a Cauchy sequence in . Then, since is complete, we have that converges. That is, there exists such that . Since is closed in , is a sequence in and , we have that . Thus converges in and we are done.

Let be a Cauchy sequence in . We want to show that tends to a limit in .
Since is a subset of , is a Cauchy sequence in . Since is complete, , for some . Since is a closed subset of , it must contain all its limit points, so . So in . So is complete.
Since the theorem is simple the ideas of the three proofs are the same, but the presentation differs. How would you rate these proofs on clarity and style?
On March 25th, this is exactly the question that Fields medalist Timothy Gowers asked his blog readers about this and four other theorems. The comments flooded with ratings and discussions of the proofs. Nobody suspected that the authors of the proofs were a graduate student, undergraduate student, and unsupervised computer. Eight days later, Gowers revealed the identity of the authors but not which wrote each answer. Instead, he asked his readers to identify which proof was written by an artificial intelligence. What do you think? Can you distinguish mathematician from machine?
Similar to the economic and psychometric Turing tests we saw earlier, this is a restriction on modality of the standard Turing test. In the standard test, the judge uses an instant messaging program to chat with the human and the machine, without knowing which is which. At the end of a discussion (which can be about anything the judge desires), she has to determine which is man and which is machine. If judges cannot distinguish the machine more than 50% of the time then it is said to pass the test. In Gowers’s mathematical Turing test, instead of asking questions, the judge can provide theorem statements and the participants must respond with a proof.
Theorem provers and assistants were some of the earliest motivations of artificial intelligence and — in one form or another — have existed for decades. Unfortunately, these computers presented their convoluted arguments in symbolic gibberish that one could only find human analogues of on the pages of Russell and Whitehead’s Principia Mathematica:
Professional mathematicians, however, usually really on English mixed with only occasional symbols. In fact, mathematics has a language all its own. A linguist’s dream, the language has the basic flavor and ambiguities of English combined with the precision of logical deduction. It provides a way to go beyond completely artificial syntactic structures of abstract languages without being lost in ambiguity and semantics as colorless green ideas sleep furiously.
Conviniently, Mohan Ganesalingam defended his computational linguistics Ph.D. with a complete analysis of formal mathematical language. In the three years since his graduation, he has been working with Gowers in developing the ideas behind a theorem prover that imitates human mathematicians and writes proofs in the language you might find in textbooks or journals.
The results of the Mathematical Turing test are presented to the right. The figure plots the results of the regular readers of Gowers’s Weblog. There are five columns corresponding to the questions, question 3 is the one at the start of this post. Blue shows the proportion that properly identified the computer, the green shows those that thought the undergrad was a computer, yellow thought the graduate was a computer, and red could not identify any of the three proofs as computer generated. The white line marks the 50% performance, and the black line marks the 33% performance — the expected result if the judges were guessing randomly. For question 3, at least, the AI passed the mathematical Turing test. For the other four questions, almost half of the respondents could not accurately identify the computer.
These results are preliminary, and shared on web in the the spirit of open science that Gowers’s has worked hard to promote among mathematicians. The authors have not released their Haskell code, but it seems like they plan to. I can’t wait to see the final results, code, and write up for the project. Will we have to question the relationship between intelligence and mathematical reasoning? Until then, I encourage you to read Gowers’s post that presents the results and project in greater detail and reveals which proof is computer generated.
Ganesalingam M. (2013). The Language of Mathematics: A Linguistic and Philosophical Investigation, Lecture Notes in Computer Science, 7805 DOI: 10.1007/9783642370120
Typos….you keep writing “Growers” instead of “Gowers”
Thank you! I want to blame it on autocorrect or lack of sleep, but really it is just inattentiveness on my part. I’ve fixed it.
Nice post…thanks for writing it up. I’d seen recent references to this work, but until I read your post I didn’t really understand what they were doing.
Pingback: Will the droids take academic jobs?  Theory, Evolution, and Games Group
Pingback: Matematica dis…umana  termu'eske
Pingback: Stats 101: an update on readership  Theory, Evolution, and Games Group
Today. Ganesalingam and Gowers released an ArXiv preprint about this work: A fully automatic problem solver with humanstyle output
Pingback: Cataloging a year of blogging: the algorithmic world  Theory, Evolution, and Games Group
Pingback: Why academics should blog and an update on readership  Theory, Evolution, and Games Group