Games, culture, and the Turing test (Part I)
March 7, 2013 11 Comments
Intelligence is one of the most loaded terms that I encounter. A common association is the popular psychometric definition — IQ. For many psychologists, this definition is too restrictive and the g factor is preferred for getting at the ‘core’ of intelligence tests. Even geneticists have latched on to g for looking at heritability of intelligence, and inadvertently helping us see that g might be too general a measure. Still, for some, these tests are not general enough since they miss the emotional aspects of being human, and tests of emotional intelligence have been developed. Unfortunately, the bar for intelligence is a moving one, whether it is the Flynn effect in IQ or more commonly: constant redefinitions of ‘intelligence’.
Does being good at memorizing make one intelligent? Maybe in the 1800s, but not when my laptop can load Google. Does being good at chess make one intelligent? Maybe before Deep Blue beat Kasparov, but not when my laptop can run a chess program that beats grand-masters. Does being good at Jeopardy make one intelligent? Maybe before IBM Watson easily defeated Jennings and Rutter. The common trend here seems to be that as soon as computers outperform humans on a given act, that act and associated skills are no longer considered central to intelligence. As such, if you believe that talking about an intelligent machine is reasonable then you want to agree on an operational benchmark of intelligence that won’t change as you develop your artificial intelligence. Alan Turing did exactly this and launched the field of AI.
I’ve stressed Turing’s greatest achievement as assembling an algorithmic lens and turning it on the world around him, and previously highlighted it’s application to biology. In the popular culture, he is probably best known for the application of the algorithmic lens to the mind — the Turing test (Turing, 1950). The test has three participants: a judge, a human, and a machine. The judge uses an instant messaging program to chat with the human and the machine, without knowing which is which. At the end of a discussion (which can be about anything the judge desires), she has to determine which is man and which is machine. If judges cannot distinguish the machine more than 50% of the time then it is said to pass the test. For Turing, this meant that the machine could “think” and for many AI researchers this is equated with intelligence.
You might have noticed a certain arbitrarity in the chosen mode of communication between judge and candidates. Text based chat seems to be a very general mode, but is general always better? Instead, we could just as easily define a psychometric Turing test by restriction the judge to only give IQ tests. Strannegård and co-authors did this by designing a program that could be tested on the mathematical sequences part of IQ tests (Strannegård, Amirghasemi, & Ulfsbäcker, 2012) and Raven’s progressive matrices (Strannegård, Cirillo, & Ström, 2012). The authors’ anthropomorphic method could match humans on either task (IQ of 100) and on the mathematical sequences greatly outperform most humans if desired (IQ of 140+). In other words, a machine can pass the psychometric Turing test and if IQ is a valid measure of intelligence then your laptop is probably smarter than you.
Of course, there is no reason to stop restricting our mode of communication. A natural continuation is to switch to the domain of game theory. The judge sets a two-player game for the human and computer to play. To decide which player is human, the judge only has access to the history of actions the players chose. This is the economic Turing test suggested by Boris Bukh and shared by Ariel Procaccia. The test can be viewed as part of the program of linking intelligence and rationality.
Procaccia raises the good point that in this game it is not clear if it is more difficult to program the computer or be the judge. Before the work of Tversky & Kahneman (1974), a judge would not even know how to distinguish a human from a rational player. Forty year later, I still don’t know of a reliable survey or meta-analysis of well-controlled experiments of human behavior in the restricted case of one-shot perfect information games. But we do know that judge designed payoffs are not the only source of variation in human strategies and I even suggest the subjective-rationality framework as I way to use evolutionary game theory to study these deviations from objective rationality. Understanding these departures is far from a settled question for psychologists and behavioral economist. In many ways, the programmer in the economic Turing test is a job description for a researcher in computational behavioral economy and the judge is an experimental psychologists. Both tasks are incredibly difficult.
For me, the key limitation of the economic (and similarly, standard) Turing test is not the difficult of judging. The fundamental flaw is the assumption that game behavior is a human universal. Much like the unreasonable assumption of objective rationality, we cannot simply assume uniformity in the heuristics and biases that shape human decision making. Before we take anything as general or universal, we have to show its consistency not only across the participants we chose, but also across different demographics and cultures. Unfortunately, much of game behavior (for instance, the irrational concept of fairness) is not consistent across cultures, even if it has a large consistency within a single culture. What a typical westerner university students considers a reasonable offer in the ultimatum game is not typical for a member of the Hadza group of Tanzania or Lamelara of Indonesia (Henrich et al., 2001). Game behavior is not a human universal, but is highly dependent of culture. We will discuss this dependence in part II of this series, and explore what it means for the Turing test and evolutionary game theory.
Until next time, I leave you with some questions that I wish I knew the answer to: Can we ever define intelligence? Can intelligence be operationalized? Do universal that are central to intelligence exist? Is intelligence a cultural construct? If there are intelligence universals then how should we modify the mode of interface used by the Turing test to focus only on them?
This post continues with a review of Henrich et al. (2001) in Part 2
Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., & McElreath, R. (2001). In Search of Homo Economicus: Behavioral Experiments in 15 Small-Scale Societies. American Economic Review, 91 (2), 73-78
Strannegård, C., Amirghasemi, M., & Ulfsbäcker, S. (2013). An anthropomorphic method for number sequence problems Cognitive Systems Research, 22-23, 27-34 DOI: 10.1016/j.cogsys.2012.05.003
Strannegård, C., Cirillo, S., & Ström, V. (2012). An anthropomorphic method for progressive matrix problems. Cognitive Systems Research.
Turing, A. M. (1950) Computing Machinery and Intelligence. Mind.
Tversky, A.; Kahneman, D. (1974) Judgment under uncertainty: Heuristics and biases. Science 185 (4157): 1124–1131.