Cooperation and the evolution of intelligence

One of the puzzles of evolutionary anthropology is to understand how our brains got to grow so big. At first sight, the question seems like a no brainer (pause for eye-roll): big brains make us smarter, more adaptable and thus result in an obvious increase in fitness, right? The problem is that brains need calories, and lots of them. Though it accounts for only 2% of your total weight, your brain will consume about 20-25% of your energy intake. Furthermore, the brain from behind its barrier doesn’t have access to the same energy resources as the rest of your body, which is part of the reason why you can’t safely starve yourself thin (if it ever crossed your mind).

So maintaining a big brain requires time and resources. For us, the trade-off is obvious, but if you’re interested in human evolutionary history, you must keep in mind that our ancestors did not have access to chain food stores or high fructose corn syrup, nor were they concerned with getting a college degree. They were dealing with a different set of trade-offs and this is what evolutionary anthropologists are after. What is it that our ancestors’ brains allowed them to do so well that warranted such unequal energy allocation?

Interdisciplinitis: Do entropic forces cause adaptive behavior?

Reinventing the square wheel. Art by Mark Fiore of San Francisco Chronicle.

Physicists are notorious for infecting other disciplines. Sometimes this can be extremely rewarding, but most of the time it is silly. I’ve already featured an example where one of the founders of algorithmic information theory completely missed the point of Darwinism; researchers working in statistical mechanics and information theory seem particularly susceptible to interdisciplinitis. The disease is not new, it formed an abscess shortly after Shannon (1948) founded information theory. The clarity of Shannon’s work allowed a metaphorical connections between entropy and pretty much anything. Researchers were quick to swell around the idea, publishing countless papers on “Information theory of X” where X is your favorite field deemed in need of a more thorough mathematical grounding.

Ten years later, Elias (1958) drained the pus with surgically precise rhetoric:

The first paper has the generic title “Information Theory, Photosynthesis and Religion” (title courtesy of D. A. Huffman), and is written by an engineer or physicist. It discusses the surprisingly close relationship between the vocabulary and conceptual framework of information theory and that of psychology (or genetics, or linguistics, or psychiatry, or business organization). It is pointed out that the concepts of structure, pattern, entropy, noise, transmitter, receiver, and code are (when properly interpreted) central to both. Having placed the discipline of psychology for the first time on a sound scientific base, the author modestly leaves the filling in of the outline to the psychologists. He has, of course, read up on the field in preparation for writing the paper, and has a firm grasp of the essentials, but he has been anxious not to clutter his mind with such details as the state of knowledge in the field, what the central problems are, how they are being attacked, et cetera, et cetera, et cetera

I highly recommend reading the whole editorial, it is only one page long and a delight of scientific sarcasm. Unfortunately — as any medical professional will tell you — draining the abscess is treating the symptoms, and without a regime of antibiotics, it is difficult to resolve the underlying cause of interdisciplinitis. Occasionally the symptoms flare up, with the most recent being two days ago in the prestigious Physics Review Letters.

Wissner-Gross & Freer (2013) try to push the relationship between intelligence and entropy maximization by suggesting that the human cognitive niche is explained by causal entropic forces. Entropic force is an apparent macroscopic force that depends on how you define the correspondence between microscopic and macroscopic states. Suppose that you have an ergodic system, in other words: every microscopic state is equally likely (or you have a well-behaved distribution over them) and the system transitions between microscopic states at random such that its long term behavior mimics the state distribution (i.e. the ensemble average and time-average distributions are the same). If you define a macroscopic variable, such that some value of the variable corresponds to more microscopic states than other values then when you talk about the system at the macroscopic level, it will seem like a force is pushing the system towards the macroscopic states with larger microscopic support. This force is called entropic because it is proportional to the entropy gradient.

Instead of defining their microstates as configurations of their system, the authors focus on possible paths the system can follow for time $\tau$ into the future. The macroscopic states are then the initial configurations of those paths. They calculate the force corresponding to this micro-macro split and use it as a real force acting on the macrosystem. The result is a dynamics that tends towards configurations where the system has the most freedom for future paths; the physics way of saying that “intelligence is keeping your options open”.

In most cases to directly invoke the entropic force as a real force would be unreasonably, but the authors use a cognitive justification. Suppose that the agent uses a Monte Carlo simulation of paths out to a time horizon %latex \tau\$ and then moves in accordance to the expected results of its’ simulation then the agents motion would be guided by the entropic force. The authors study the behavior of such an agent in four models: particle in a box, inverted pendulum, a tool use puzzle, and a “social cooperation” puzzle. Unfortunately, these tasks are enough to both falsify the authors’ theory and show that they do not understand the sort of questions behavioral scientists are asking.

If you are locked in a small empty (maybe padded, after reading this blog too much) room for an extended amount of time, where would you chose to sit? I would suspect most people would sit in the corner or near one of the walls, where they can rest. That is where I would sit. However, if adaptive behavior is meant to follow Wissner-Gross & Freer (2013) then, as the particle in their first model, you would be expected to remain in the middle of the room. More generally, you could modify any of the authors’ tasks by having the experimenter remove two random objects from the agents’ environment whenever they complete the task of securing a goal object. If these objects are manipulable by the agents, then the authors would predict that the agents would not complete their task, regardless of what the objects are since there are more future paths with the option to manipulate two objects instead of one. Of course, in a real setting, it would depend on what these objects are (food versus neutral) on if the agents would prefer them. None of this is built into the theory, so it is hard to take this as the claimed general theory of adaptive behavior. Of course, it could be that the authors leave “the filling in of the outline to the psychologists”.

Do their experiments address any questions psychologists are actually interested in? This is most clearly interested with their social cooperation task, which is meant to be an idealization of the following task we can see bonobos accomplishing (first minute of the video):

Yay, bonobos! Is the salient feature of this task that the apes figure out how to get the reward? No, it is actually that bonobos will cooperate in getting the reward regardless of it is in the central bin (to be shared between them) or into side bins (for each to grab their own). However, chimpanzees would work together only if the food is in separate bins and not if it is available in the central bin to be split. In the Wissner-Gross & Freer (2013) approach, both conditions would result in the same behavior. The authors are throwing away the relevant details of the model, and keeping the ones that psychologists don’t care about.

The paper seems to be an obtuse way of saying that “agents prefer to maximize their future possibilities”. This is definitely true in some cases, but false in others. However, it is not news to psychologists. Further, the authors abstraction misses the features psychologists care about while stressed irrelevant ones. It is a prime example of interdisciplinitis, and raises the main question: how can we avoid making the same mistake?

Since I am a computer scientists (and to some extent, physicist) working on interdisciplinary questions, this is particularly important for me. How can I be a good connector of disciplines? The first step seems to publish in journal relevant to the domain of the questions being asked, instead of the domain from which the tools being used originate. Although mathematical tools tends to be more developed in physics than biology or psychology, the ones used in Wissner-Gross & Freer (2013) are not beyond what you would see in the Journal of Mathematical Psychology. Mathematical psychologists tend to be well versed in the basics of information theory, since it tends to be important for understanding Bayesian inference and machine learning. As such, entropic forces can be easily presented to them in much the same way as I presented in this post.

References

Elias, P. (1958). Two famous papers. IRE Transactions on Information Theory, 4(3): 99.

Shannon, Claude E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal. 27(3): 379–423.

Wissner-Gross, A.D., & Freer, C.E. (2013). Causal Entropic Forces Phys. Rev. Lett., 110 (16) : 10.1103/PhysRevLett.110.168702

Mathematical Turing test: Readable proofs from your computer

We have previously discussed the finicky task of defining intelligence, but surely being able to do math qualifies? Even if the importance of mathematics in science is questioned by people as notable as E.O. Wilson, surely nobody questions it as an intelligent activity? Mathematical reasoning is not necessary for intelligence, but surely it is sufficient?

Note that by mathematics, I don’t mean number crunching or carrying out a rote computation. I mean the bread and butter of what mathematicians do: proving theorems and solving general problems. As an example, consider the following theorem about metric spaces:

Let X be a complete metric space and let A be a closed subset of X. Then A is complete.

Can you prove this theorem? Would you call someone that can — intelligent? Take a moment to come up with a proof.

Games, culture, and the Turing test (Part II)

This post is a continuation of Part 1 from last week that introduced and motivated the economic Turing test.

Joseph Henrich

When discussing culture, the first person that springs to mind is Joseph Henrich. He is the Canada Research Chair in Culture, Cognition and Coevolution, and Professor at the Departments of Psychology and Economics at the University of British Columbia. My most salient association with him is the cultural brain hypothesis (CBH) that suggests that the human brain developed its size and complexity in order to better transmit cultural information. This idea seems like a nice continuation of Dunbar’s (1998) Social Brain hypothesis (SBH; see Dunbar & Shultz (2007) for a recent review or this EvoAnth blog post for an overview), although I am still unaware of strong evidence for the importance of gene-culture co-evolution — a requisite for CBH. Both hypotheses are also essential to studying intelligence; in animals intelligence is usually associated with (properly normalized) brain size and complexity, and social and cultural structure is usually associated with higher intellect.

To most evolutionary game theorists, Henrich is know not for how culture shapes brain development but how behavior in games and concepts of fairness vary across cultures. Henrich et al. (2001) studied the behavior of people from 15 small-scale societies in the prototypical test of fairness: the ultimatum game. They showed a great variability in how fairness is conceived and what operationalist results the conceptions produce across the societies they studied.

In general, the ‘universals’ that researchers learnt from studying western university students were not very universal. The groups studied fell into four categories:

• Three foraging societies,
• Six practicing slash-and-burn horticulture,
• Four nomadic herding groups, and
• Three small-scale farming societies.

These add up to sixteen, since the Sangu of Tanzania were split into farmers and herders. In fact, in the full analysis presented in table 1, the authors consider a total of 18 groups; splitting the Hadza of Tanzania into big and small camp, and the villagers of Zimbabwe into unsettled and resettled. Henrich et al. (2001) conclude that neither homoeconomicus nor the western university student (WEIRD; see Henrich, Heine, & Norenzaya (2010) for a definition and discussion) models accurately describe any of these groups. I am not sure why I should trust this result given a complete lack of statistical analysis, small sample size, and what seems like arithmetic mistakes in the table (for instance the resettled villagers rejected 12 out of 86 offers, but the authors list the rate as 7%). However, even without a detailed statistical analysis it is clear that there is a large variance across societies, and at least some of the societies don’t match economically rational behavior or the behavior of WEIRD participants.

The ultimatum game is an interaction between two participants, one is randomly assigned to be Alice and the other is Bob. Alice is given a couple of days wage in money (either the local currency or other common units of exchange like tobacco) and can decide what proportion of it to offer to Bob. She can choose to offer as little or as much as she wants. Bob is then told what proportion Alice offered and can decide to accept or reject. If Bob accepts then the game ends and each party receives their fraction of the goods. If Bob declines then both Alice and Bob receive nothing and the game terminates. The interaction is completely anonymous and happens only once to avoid effects of reputation or direct reciprocity. In this setting, homoeconomicus would give the lowest possible offer if playing as Alice and accept any non-zero offer as Bob (any money is better than no money).

The groups that most closely match the economists’ model are the Machiguenga of Peru, Quichua of Ecuador, and small camp Hadza. They provide the lowest average offers of 26%-27%. They reject offers 5%, 15%, and 28% of the time, respectively. Only the Tsimane of Bolivia (70 interactions), Achuar of Ecuador (16 interactions), and Ache of Paraguay (51 interactions) have zero offer rejection rates. However, members of all three societies offer a sizeable initial offer, averaging 37%, 42%, and 51%, respectively. A particularly surprising group is the Lamelara of Indonesia that offered on average 58% of their goods, and still rejected 3 out of 8 offers (they also generated 4 out of 20 experimenter generated low offers, since no low offers were given by group members). This behavior is drastically different from rational, and not very close to WEIRD participants that tend to offer around 50% and reject offers below 20% about 40% to 60% of the time. If we are to narrow our lens of human behavior to that of weird participants or economic theorizing than it is easy for us to miss the big picture of the drastic variability of behavior across human cultures.

It’s easy to see what we want instead of the truth when we focus too narrowly.

What does this mean for the economic Turing test? We cannot assume that the judge is able to decide how distinguish man from machine without also mistaking people of different cultures for machines. Without very careful selection of games, a judge can only distinguish members of its own culture from members of others. Thus, it is not a test of rationality but of conformation to social norms. I expect this flaw to extend to the traditional Turing test as well. Even if we eliminate the obvious cultural barrier of language by introducing a universal translator, I suspect that there will still be cultural norms that might force the judge to classify members of other cultures as machines. The operationalization of the Turing test has to be carefully studied with how it interacts with different cultures. More importantly, we need to question if a universal definition of intelligence is possible, or if it is inherently dependent on the culture that defines it.

What does this mean for evolutionary game theory? As an evolutionary game theorist, I often take an engineering perspective: pick a departure from objective rationality observed by the psychologists and design a simple model that reproduces this effect. The dependence of game behavior on culture means that I need to introduce a “culture knob” (either as a free or structural parameter) that can be used to tune my model to capture the variance in behavior observed across cultures. This also means that modelers must remain agnostic to the method of inheritance to allow for both genetic and cultural transmission (see Lansing & Cox (2011) for further considerations on how to use EGT when studying culture). Any conclusions or arguments for biological plausibility made from simulations must be examined carefully and compared to existing cross-cultural data. For example, it doesn’t make sense to conclude that fairness is a biologically evolved universal (Nowak, Page, & Sigmund, 2000) if we see such great variance in the concepts of fairness across different cultures of genetically similar humans.

References

Dunbar, R.I.M. (1998) The social brain hypothesis. Evolutionary Anthropology 6(5): 179-190. [pdf]

Dunbar, R.I.M., & Shultz, S. (2007) Evolution in the Social Brain. Science 317. [pdf]

Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., & McElreath, R. (2001). In Search of Homo Economicus: Behavioral Experiments in 15 Small-Scale Societies American Economic Review, 91 (2), 73-78 DOI: 10.1257/aer.91.2.73

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world. Behavioral and Brain Sciences, 33(2-3), 61-83.

Lansing, J. S., & Cox, M. P. (2011). The Domain of the Replicators. Current Anthropology, 52(1), 105-125.

Nowak, M. A., Page, K. M., & Sigmund, K. (2000). Fairness versus reason in the ultimatum game. Science, 289(5485), 1773-1775.

Games, culture, and the Turing test (Part I)

Intelligence is one of the most loaded terms that I encounter. A common association is the popular psychometric definition — IQ. For many psychologists, this definition is too restrictive and the g factor is preferred for getting at the ‘core’ of intelligence tests. Even geneticists have latched on to g for looking at heritability of intelligence, and inadvertently helping us see that g might be too general a measure. Still, for some, these tests are not general enough since they miss the emotional aspects of being human, and tests of emotional intelligence have been developed. Unfortunately, the bar for intelligence is a moving one, whether it is the Flynn effect in IQ or more commonly: constant redefinitions of ‘intelligence’.

Does being good at memorizing make one intelligent? Maybe in the 1800s, but not when my laptop can load Google. Does being good at chess make one intelligent? Maybe before Deep Blue beat Kasparov, but not when my laptop can run a chess program that beats grand-masters. Does being good at Jeopardy make one intelligent? Maybe before IBM Watson easily defeated Jennings and Rutter. The common trend here seems to be that as soon as computers outperform humans on a given act, that act and associated skills are no longer considered central to intelligence. As such, if you believe that talking about an intelligent machine is reasonable then you want to agree on an operational benchmark of intelligence that won’t change as you develop your artificial intelligence. Alan Turing did exactly this and launched the field of AI.

I’ve stressed Turing’s greatest achievement as assembling an algorithmic lens and turning it on the world around him, and previously highlighted it’s application to biology. In the popular culture, he is probably best known for the application of the algorithmic lens to the mind — the Turing test (Turing, 1950). The test has three participants: a judge, a human, and a machine. The judge uses an instant messaging program to chat with the human and the machine, without knowing which is which. At the end of a discussion (which can be about anything the judge desires), she has to determine which is man and which is machine. If judges cannot distinguish the machine more than 50% of the time then it is said to pass the test. For Turing, this meant that the machine could “think” and for many AI researchers this is equated with intelligence.

You might have noticed a certain arbitrarity in the chosen mode of communication between judge and candidates. Text based chat seems to be a very general mode, but is general always better? Instead, we could just as easily define a psychometric Turing test by restriction the judge to only give IQ tests. Strannegård and co-authors did this by designing a program that could be tested on the mathematical sequences part of IQ tests (Strannegård, Amirghasemi, & Ulfsbäcker, 2012) and Raven’s progressive matrices (Strannegård, Cirillo, & Ström, 2012). The authors’ anthropomorphic method could match humans on either task (IQ of 100) and on the mathematical sequences greatly outperform most humans if desired (IQ of 140+). In other words, a machine can pass the psychometric Turing test and if IQ is a valid measure of intelligence then your laptop is probably smarter than you.

Of course, there is no reason to stop restricting our mode of communication. A natural continuation is to switch to the domain of game theory. The judge sets a two-player game for the human and computer to play. To decide which player is human, the judge only has access to the history of actions the players chose. This is the economic Turing test suggested by Boris Bukh and shared by Ariel Procaccia. The test can be viewed as part of the program of linking intelligence and rationality.

Procaccia raises the good point that in this game it is not clear if it is more difficult to program the computer or be the judge. Before the work of Tversky & Kahneman (1974), a judge would not even know how to distinguish a human from a rational player. Forty year later, I still don’t know of a reliable survey or meta-analysis of well-controlled experiments of human behavior in the restricted case of one-shot perfect information games. But we do know that judge designed payoffs are not the only source of variation in human strategies and I even suggest the subjective-rationality framework as I way to use evolutionary game theory to study these deviations from objective rationality. Understanding these departures is far from a settled question for psychologists and behavioral economist. In many ways, the programmer in the economic Turing test is a job description for a researcher in computational behavioral economy and the judge is an experimental psychologists. Both tasks are incredibly difficult.

For me, the key limitation of the economic (and similarly, standard) Turing test is not the difficult of judging. The fundamental flaw is the assumption that game behavior is a human universal. Much like the unreasonable assumption of objective rationality, we cannot simply assume uniformity in the heuristics and biases that shape human decision making. Before we take anything as general or universal, we have to show its consistency not only across the participants we chose, but also across different demographics and cultures. Unfortunately, much of game behavior (for instance, the irrational concept of fairness) is not consistent across cultures, even if it has a large consistency within a single culture. What a typical westerner university students considers a reasonable offer in the ultimatum game is not typical for a member of the Hadza group of Tanzania or Lamelara of Indonesia (Henrich et al., 2001). Game behavior is not a human universal, but is highly dependent of culture. We will discuss this dependence in part II of this series, and explore what it means for the Turing test and evolutionary game theory.

Until next time, I leave you with some questions that I wish I knew the answer to: Can we ever define intelligence? Can intelligence be operationalized? Do universal that are central to intelligence exist? Is intelligence a cultural construct? If there are intelligence universals then how should we modify the mode of interface used by the Turing test to focus only on them?

This post continues with a review of Henrich et al. (2001) in Part 2

References

Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., & McElreath, R. (2001). In Search of Homo Economicus: Behavioral Experiments in 15 Small-Scale Societies. American Economic Review, 91 (2), 73-78

Strannegård, C., Amirghasemi, M., & Ulfsbäcker, S. (2013). An anthropomorphic method for number sequence problems Cognitive Systems Research, 22-23, 27-34 DOI: 10.1016/j.cogsys.2012.05.003

Strannegård, C., Cirillo, S., & Ström, V. (2012). An anthropomorphic method for progressive matrix problems. Cognitive Systems Research.

Turing, A. M. (1950) Computing Machinery and Intelligence. Mind.

Tversky, A.; Kahneman, D. (1974) Judgment under uncertainty: Heuristics and biases. Science 185 (4157): 1124–1131.