Games, culture, and the Turing test (Part II)

This post is a continuation of Part 1 from last week that introduced and motivated the economic Turing test.

Joseph Henrich

Joseph Henrich

When discussing culture, the first person that springs to mind is Joseph Henrich. He is the Canada Research Chair in Culture, Cognition and Coevolution, and Professor at the Departments of Psychology and Economics at the University of British Columbia. My most salient association with him is the cultural brain hypothesis (CBH) that suggests that the human brain developed its size and complexity in order to better transmit cultural information. This idea seems like a nice continuation of Dunbar’s (1998) Social Brain hypothesis (SBH; see Dunbar & Shultz (2007) for a recent review or this EvoAnth blog post for an overview), although I am still unaware of strong evidence for the importance of gene-culture co-evolution — a requisite for CBH. Both hypotheses are also essential to studying intelligence; in animals intelligence is usually associated with (properly normalized) brain size and complexity, and social and cultural structure is usually associated with higher intellect.

To most evolutionary game theorists, Henrich is know not for how culture shapes brain development but how behavior in games and concepts of fairness vary across cultures. Henrich et al. (2001) studied the behavior of people from 15 small-scale societies in the prototypical test of fairness: the ultimatum game. They showed a great variability in how fairness is conceived and what operationalist results the conceptions produce across the societies they studied.

In general, the ‘universals’ that researchers learnt from studying western university students were not very universal. The groups studied fell into four categories:

  • Three foraging societies,
  • Six practicing slash-and-burn horticulture,
  • Four nomadic herding groups, and
  • Three small-scale farming societies.

These add up to sixteen, since the Sangu of Tanzania were split into farmers and herders. In fact, in the full analysis presented in table 1, the authors consider a total of 18 groups; splitting the Hadza of Tanzania into big and small camp, and the villagers of Zimbabwe into unsettled and resettled. Henrich et al. (2001) conclude that neither homoeconomicus nor the western university student (WEIRD; see Henrich, Heine, & Norenzaya (2010) for a definition and discussion) models accurately describe any of these groups. I am not sure why I should trust this result given a complete lack of statistical analysis, small sample size, and what seems like arithmetic mistakes in the table (for instance the resettled villagers rejected 12 out of 86 offers, but the authors list the rate as 7%). However, even without a detailed statistical analysis it is clear that there is a large variance across societies, and at least some of the societies don’t match economically rational behavior or the behavior of WEIRD participants.

The ultimatum game is an interaction between two participants, one is randomly assigned to be Alice and the other is Bob. Alice is given a couple of days wage in money (either the local currency or other common units of exchange like tobacco) and can decide what proportion of it to offer to Bob. She can choose to offer as little or as much as she wants. Bob is then told what proportion Alice offered and can decide to accept or reject. If Bob accepts then the game ends and each party receives their fraction of the goods. If Bob declines then both Alice and Bob receive nothing and the game terminates. The interaction is completely anonymous and happens only once to avoid effects of reputation or direct reciprocity. In this setting, homoeconomicus would give the lowest possible offer if playing as Alice and accept any non-zero offer as Bob (any money is better than no money).

The groups that most closely match the economists’ model are the Machiguenga of Peru, Quichua of Ecuador, and small camp Hadza. They provide the lowest average offers of 26%-27%. They reject offers 5%, 15%, and 28% of the time, respectively. Only the Tsimane of Bolivia (70 interactions), Achuar of Ecuador (16 interactions), and Ache of Paraguay (51 interactions) have zero offer rejection rates. However, members of all three societies offer a sizeable initial offer, averaging 37%, 42%, and 51%, respectively. A particularly surprising group is the Lamelara of Indonesia that offered on average 58% of their goods, and still rejected 3 out of 8 offers (they also generated 4 out of 20 experimenter generated low offers, since no low offers were given by group members). This behavior is drastically different from rational, and not very close to WEIRD participants that tend to offer around 50% and reject offers below 20% about 40% to 60% of the time. If we are to narrow our lens of human behavior to that of weird participants or economic theorizing than it is easy for us to miss the big picture of the drastic variability of behavior across human cultures.

It's easy to see what we want instead of the truth when we focus too narrowly.

It’s easy to see what we want instead of the truth when we focus too narrowly.

What does this mean for the economic Turing test? We cannot assume that the judge is able to decide how distinguish man from machine without also mistaking people of different cultures for machines. Without very careful selection of games, a judge can only distinguish members of its own culture from members of others. Thus, it is not a test of rationality but of conformation to social norms. I expect this flaw to extend to the traditional Turing test as well. Even if we eliminate the obvious cultural barrier of language by introducing a universal translator, I suspect that there will still be cultural norms that might force the judge to classify members of other cultures as machines. The operationalization of the Turing test has to be carefully studied with how it interacts with different cultures. More importantly, we need to question if a universal definition of intelligence is possible, or if it is inherently dependent on the culture that defines it.

What does this mean for evolutionary game theory? As an evolutionary game theorist, I often take an engineering perspective: pick a departure from objective rationality observed by the psychologists and design a simple model that reproduces this effect. The dependence of game behavior on culture means that I need to introduce a “culture knob” (either as a free or structural parameter) that can be used to tune my model to capture the variance in behavior observed across cultures. This also means that modelers must remain agnostic to the method of inheritance to allow for both genetic and cultural transmission (see Lansing & Cox (2011) for further considerations on how to use EGT when studying culture). Any conclusions or arguments for biological plausibility made from simulations must be examined carefully and compared to existing cross-cultural data. For example, it doesn’t make sense to conclude that fairness is a biologically evolved universal (Nowak, Page, & Sigmund, 2000) if we see such great variance in the concepts of fairness across different cultures of genetically similar humans.

References

Dunbar, R.I.M. (1998) The social brain hypothesis. Evolutionary Anthropology 6(5): 179-190. [pdf]

Dunbar, R.I.M., & Shultz, S. (2007) Evolution in the Social Brain. Science 317. [pdf]

Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., & McElreath, R. (2001). In Search of Homo Economicus: Behavioral Experiments in 15 Small-Scale Societies American Economic Review, 91 (2), 73-78 DOI: 10.1257/aer.91.2.73

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world. Behavioral and Brain Sciences, 33(2-3), 61-83.

Lansing, J. S., & Cox, M. P. (2011). The Domain of the Replicators. Current Anthropology, 52(1), 105-125.

Nowak, M. A., Page, K. M., & Sigmund, K. (2000). Fairness versus reason in the ultimatum game. Science, 289(5485), 1773-1775.

claimtoken-514691b910cca

Games, culture, and the Turing test (Part I)

Intelligence is one of the most loaded terms that I encounter. A common association is the popular psychometric definition — IQ. For many psychologists, this definition is too restrictive and the g factor is preferred for getting at the ‘core’ of intelligence tests. Even geneticists have latched on to g for looking at heritability of intelligence, and inadvertently helping us see that g might be too general a measure. Still, for some, these tests are not general enough since they miss the emotional aspects of being human, and tests of emotional intelligence have been developed. Unfortunately, the bar for intelligence is a moving one, whether it is the Flynn effect in IQ or more commonly: constant redefinitions of ‘intelligence’.

Does being good at memorizing make one intelligent? Maybe in the 1800s, but not when my laptop can load Google. Does being good at chess make one intelligent? Maybe before Deep Blue beat Kasparov, but not when my laptop can run a chess program that beats grand-masters. Does being good at Jeopardy make one intelligent? Maybe before IBM Watson easily defeated Jennings and Rutter. The common trend here seems to be that as soon as computers outperform humans on a given act, that act and associated skills are no longer considered central to intelligence. As such, if you believe that talking about an intelligent machine is reasonable then you want to agree on an operational benchmark of intelligence that won’t change as you develop your artificial intelligence. Alan Turing did exactly this and launched the field of AI.

I’ve stressed Turing’s greatest achievement as assembling an algorithmic lens and turning it on the world around him, and previously highlighted it’s application to biology. In the popular culture, he is probably best known for the application of the algorithmic lens to the mind — the Turing test (Turing, 1950). The test has three participants: a judge, a human, and a machine. The judge uses an instant messaging program to chat with the human and the machine, without knowing which is which. At the end of a discussion (which can be about anything the judge desires), she has to determine which is man and which is machine. If judges cannot distinguish the machine more than 50% of the time then it is said to pass the test. For Turing, this meant that the machine could “think” and for many AI researchers this is equated with intelligence.

Hit Turing right in the test-ees

You might have noticed a certain arbitrarity in the chosen mode of communication between judge and candidates. Text based chat seems to be a very general mode, but is general always better? Instead, we could just as easily define a psychometric Turing test by restriction the judge to only give IQ tests. Strannegård and co-authors did this by designing a program that could be tested on the mathematical sequences part of IQ tests (Strannegård, Amirghasemi, & Ulfsbäcker, 2012) and Raven’s progressive matrices (Strannegård, Cirillo, & Ström, 2012). The authors’ anthropomorphic method could match humans on either task (IQ of 100) and on the mathematical sequences greatly outperform most humans if desired (IQ of 140+). In other words, a machine can pass the psychometric Turing test and if IQ is a valid measure of intelligence then your laptop is probably smarter than you.

Of course, there is no reason to stop restricting our mode of communication. A natural continuation is to switch to the domain of game theory. The judge sets a two-player game for the human and computer to play. To decide which player is human, the judge only has access to the history of actions the players chose. This is the economic Turing test suggested by Boris Bukh and shared by Ariel Procaccia. The test can be viewed as part of the program of linking intelligence and rationality.

Procaccia raises the good point that in this game it is not clear if it is more difficult to program the computer or be the judge. Before the work of Tversky & Kahneman (1974), a judge would not even know how to distinguish a human from a rational player. Forty year later, I still don’t know of a reliable survey or meta-analysis of well-controlled experiments of human behavior in the restricted case of one-shot perfect information games. But we do know that judge designed payoffs are not the only source of variation in human strategies and I even suggest the subjective-rationality framework as I way to use evolutionary game theory to study these deviations from objective rationality. Understanding these departures is far from a settled question for psychologists and behavioral economist. In many ways, the programmer in the economic Turing test is a job description for a researcher in computational behavioral economy and the judge is an experimental psychologists. Both tasks are incredibly difficult.

For me, the key limitation of the economic (and similarly, standard) Turing test is not the difficult of judging. The fundamental flaw is the assumption that game behavior is a human universal. Much like the unreasonable assumption of objective rationality, we cannot simply assume uniformity in the heuristics and biases that shape human decision making. Before we take anything as general or universal, we have to show its consistency not only across the participants we chose, but also across different demographics and cultures. Unfortunately, much of game behavior (for instance, the irrational concept of fairness) is not consistent across cultures, even if it has a large consistency within a single culture. What a typical westerner university students considers a reasonable offer in the ultimatum game is not typical for a member of the Hadza group of Tanzania or Lamelara of Indonesia (Henrich et al., 2001). Game behavior is not a human universal, but is highly dependent of culture. We will discuss this dependence in part II of this series, and explore what it means for the Turing test and evolutionary game theory.

Until next time, I leave you with some questions that I wish I knew the answer to: Can we ever define intelligence? Can intelligence be operationalized? Do universal that are central to intelligence exist? Is intelligence a cultural construct? If there are intelligence universals then how should we modify the mode of interface used by the Turing test to focus only on them?

This post continues with a review of Henrich et al. (2001) in Part 2

References

Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., & McElreath, R. (2001). In Search of Homo Economicus: Behavioral Experiments in 15 Small-Scale Societies. American Economic Review, 91 (2), 73-78

Strannegård, C., Amirghasemi, M., & Ulfsbäcker, S. (2013). An anthropomorphic method for number sequence problems Cognitive Systems Research, 22-23, 27-34 DOI: 10.1016/j.cogsys.2012.05.003

Strannegård, C., Cirillo, S., & Ström, V. (2012). An anthropomorphic method for progressive matrix problems. Cognitive Systems Research.

Turing, A. M. (1950) Computing Machinery and Intelligence. Mind.

Tversky, A.; Kahneman, D. (1974) Judgment under uncertainty: Heuristics and biases. Science 185 (4157): 1124–1131.