## Cooperation through useful delusions: quasi-magical thinking and subjective utility

Economists that take bounded rationality seriously treat their research like a chess game and follow the reductive approach: start with all the pieces — a fully rational agent — and kill/capture/remove pieces until the game ends, i.e. see what sort of restrictions can be placed on the agents to deviate from rationality and better reflect human behavior. Sometimes these restrictions can be linked to evolution, but usually the models are independent of evolutionary arguments. In contrast, evolutionary game theory has traditionally played Go and concerned itself with the simplest agents that are only capable of behaving according to a fixed strategy specified by their genes — no learning, no reasoning, no built in rationality. If egtheorists want to approximate human behavior then they have to play new stones and take a constructuve approach: start with genetically predetermined agents and build them up to better reflect the richness and variety of human (or even other animal) behaviors (McNamara, 2013). I’ve always preferred Go over chess, and so I am partial to the constructive approach toward rationality. I like to start with replicator dynamics and work my way up, add agency, perception and deception, ethnocentrism, or emotional profiles and general condition behavior.

Most recently, my colleagues and I have been interested in the relationship between evolution and learning, both individual and social. A key realization has been that evolution takes cues from an external reality, while learning is guided by a subjective utility, and there is no a priori reason for those two incentives to align. As such, we can have agents acting rationally on their genetically specified subjective perception of the objective game. To avoid making assumptions about how agents might deal with risk, we want them to know a probability that others will cooperate with them. However, this depends on the agent’s history and local environment, so each agent should learn these probabilities for itself. In our previous presentation of results we concentrated on the case where the agents were rational Bayesian learners, but we know that this is an assumption not justified by evolutionary models or observations of human behavior. Hence, in this post we will explore the possibility that agents can have learning peculiarities like quasi-magical thinking, and how these peculiarities can co-evolve with subjective utilities.

## Baldwin effect and overcoming the rationality fetish

G.G. Simpson and J.M. Baldwin

As I’ve mentioned previously, one of the amazing features of the internet is that you can take almost any idea and find a community obsessed with it. Thus, it isn’t surprising that there is a prominent subculture that fetishizes rationality and Bayesian learning. They tend to accumulate around forums with promising titles like OvercomingBias and Less Wrong. Since these communities like to stay abreast with science, they often offer evolutionary justifications for why humans might be Bayesian learners and claim a “perfect Bayesian reasoner as a fixed point of Darwinian evolution”. This lets them side-stepped observed non-Bayesian behavior in humans, by saying that we are evolving towards, but haven’t yet reached this (potentially unreachable, but approximable) fixed point. Unfortunately, even the fixed-point argument is naive of critiques like the Simpson-Baldwin effect.

Introduced in 1896 by psychologist J.M. Baldwin then named and reconciled with the modern synthesis by leading paleontologist G.G. Simpson (1953), the Simpson-Baldwin effect posits that “[c]haracters individually acquired by members of a group of organisms may eventually, under the influence of selection, be reenforced or replaced by similar hereditary characters” (Simpson, 1953). More explicitly, it consists of a three step process (some of which can occur in parallel or partially so):

1. Organisms adapt to the environment individually.
2. Genetic factors produce hereditary characteristics similar to the ones made available by individual adaptation.
3. These hereditary traits are favoured by natural selection and spread in the population.

The overall result is that originally individual non-hereditary adaptation become hereditary. For Baldwin (1886,1902) and other early proponents (Morgan 1886; Osborn 1886, 1887) this was a way to reconcile Darwinian and strong Lamarkian evolution. With the latter model of evolution exorcised from the modern synthesis, Simpson’s restatement became a paradox: why do we observe the costly mechanism and associated errors of individual learning, if learning does not enhance individual fitness at equilibrium and will be replaced by simpler non-adaptive strategies? This encompass more specific cases like Rogers’ paradox (Boyd & Richerson, 1985; Rogers, 1988) of social learning.

## Bounded rationality: systematic mistakes and conflicting agents of mind

Before her mother convinced her to be a doctor, my mother was a ballerina. As a result, whenever I tried to blame some external factor for my failures, I was met with my mother’s favorite aphorism: a bad dancer’s shoes are always too tight.

“Ahh, another idiosyncratic story about the human side of research,” you note, “why so many?”

Partially these stories are to broaden TheEGG blog’s appeal, and to lull you into a false sense of security before overrunning you with mathematics. Partially it is a homage to the blogs that inspired me to write, such as Lipton and Regan’s “Godel’s Lost Letters and P = NP”. Mostly, however, it is to show that science — like everything else — is a human endeavour with human roots and subject to all the excitement, disappointments, insights, and biases that this entails. Although science is a human narrative, unlike the similar story of pseudoscience, she tries to overcome or recognize her biases when they hinder her development.

The self-serving bias has been particularily thorny in decision sciences. Humans, especially individuals with low self-esteem, tend to attribute their success to personal skill, while blaming their failures on external factors. As you can guess from my mother’s words, I struggle with this all the time. When I try to explain the importance of worst-case analysis, algorithmic thinking, or rigorous modeling to biologist and fail, my first instinct is to blame it on the structural differences between the biological and mathematical community, or biologists’ discomfort with mathematics. In reality, the blame is with my inability to articulate the merits of my stance, or provide strong evidence that I can offer any practical biological results. Even more depressing, I might be suffering from a case of interdisciplinitis and promoting a meritless idea while completely failing to connect to the central questions in biology. However, I must maintain my self-esteem, and even from my language here, you can tell that I am unwilling to fully entertain the latter possibility. Interestingly, this sort of bias can propagate from individual researchers into their theories.

One of the difficulties for biologists, economists, and other decision scientists has been coming to grips with observed irrationality in humans and other animals. Why wouldn’t there be a constant pressure toward more rational animals that maximize their fitness? Who is to blame for this irrational behavior? In line with the self-serving bias, it must be that crack in the sidewalk! Or maybe some other feature of the environment.

## Evolving useful delusions to promote cooperation

This joint work with Marcel Montrey and Thomas Shultz combines — to be consistent with the interdisciplinary theme of this symposium — ideas from biology, economics, a little bit of cognitive science, and the approach is through applied mathematics. This post is a transcript of a presentation I gave on March 27th and covers part of my presentation today at Swarmfest.

## Quasi-delusions and inequality aversion

Patient M: It’s impossible —- no one could urinate into that bottle -— at least no woman could. I’m furious with her [these are the patient’s emphases] and I’m damned if I am going to do it unless she gives me another kind of bottle. It’s just impossible to use that little thing.

Analyst: It sounds as if a few minutes of communication with the nurse could clear up the realistic part of the difficulty—is there some need to be angry with the nurse and keep the feeling that she has done something to you?

Patient M: The ‘impossibility’ of using the bottle could be gotten over by using another—or I could use a funnel or a plastic cup and pour it into the bottle. But I just won’t. It makes me so mad. If she wants that sample, she is going to have to solve that problem. [Sheepishly] I know how irrational all this is. The nurse is really a very nice person. I could easily talk to her about this, and/or just bring in my own container. But I am really so furious about it that I put all my logic and knowledge aside and I feel stubborn—I just won’t do it. She [back to the emphasis] can’t make me use that bottle. She gave it to me and it’s up to her to solve the problem.

The above is an excerpt from a session between psychoanalyst Leonard Shengold (1988) and his patient. The focus is on the contrast between M’s awareness of her delusion, and yet her continued anger and frustration. Rationally and consciously she knows that there is no reason to be angry at the nurse, but yet some unconscious, emotional impulse pushes her to feel externalities that produce a behavior that she can recognize as irrational. This is a quasi-delusion.

## Quasi-magical thinking and superrationality for Bayesian agents

As part of our objective and subjective rationality model, we want a focal agent to learn the probability that others will cooperate given that the focal agent cooperates ($p$) or defects ($q$). In a previous post we saw how to derive point estimates for $p$ and $q$ (and learnt that they are the maximum likelihood estimates):

$p_0 = \frac{n_{CC} + 1}{n_{CC} + n_{CD} + 2}$, and $q_0 = \frac{n_{DC} + 1}{n_{DC} + n_{DD} + 2}$

where $n_{XY}$ is the number of times Alice displayed behavior $X$ and saw Bob display behavior $Y$. In the above equations, a number like $n_{CD}$ is interpreted by Alice as “the number of times I cooperated and Bob ‘responded’ with a defection”. I put ‘responded’ in quotations because Bob cannot actually condition his behavior on Alice’s action. Note that in this view, Alice is placing herself in a special position of actor, and observing Bob’s behavior in response to her actions; she is failing to put herself in Bob’s shoes. Instead, she can realize that Bob would be interested in doing the same sort of sampling, and interpret $n_{CD}$ more neutrally as “number of times agent 1 cooperates and agent 2 defects”, in this case she will see that for Bob, the equivalent quantity is $n_{DC}$.

## Quasi-magical thinking and the public good

Cooperation is a puzzle because it is not obvious why cooperation, which is good for the group, is so common, despite the fact that defection is often best for the individual. Though we tend to view this issue through the lens of the prisoner’s dilemma, Artem recently pointed me to a paper by Joanna Masel, a mathematical biologist at Stanford, focusing on the public goods game [1]. In this game, each player is given 20 tokens and chooses how many of these they wish to contribute to the common pool. Once players have made their decisions, the pool is multiplied by some factor m (where mn > 1) and the pool is distributed equally back to all players. To optimize the group’s payoff, players should take advantage of the pool’s multiplicative effects by contributing all of their tokens. However, because a player’s share does not depend on the size of their contribution, it is easy to see that this is not the best individual strategy (Nash equilibrium). By contributing nothing to the common pool, a player gets a share of the pool in addition to keeping all of the tokens they initially received. This conflict captures the puzzle of cooperation, which in this case is: Why do human participants routinely contribute about half of their funds, if never contributing is individually optimal?

## Games, culture, and the Turing test (Part II)

This post is a continuation of Part 1 from last week that introduced and motivated the economic Turing test.

Joseph Henrich

When discussing culture, the first person that springs to mind is Joseph Henrich. He is the Canada Research Chair in Culture, Cognition and Coevolution, and Professor at the Departments of Psychology and Economics at the University of British Columbia. My most salient association with him is the cultural brain hypothesis (CBH) that suggests that the human brain developed its size and complexity in order to better transmit cultural information. This idea seems like a nice continuation of Dunbar’s (1998) Social Brain hypothesis (SBH; see Dunbar & Shultz (2007) for a recent review or this EvoAnth blog post for an overview), although I am still unaware of strong evidence for the importance of gene-culture co-evolution — a requisite for CBH. Both hypotheses are also essential to studying intelligence; in animals intelligence is usually associated with (properly normalized) brain size and complexity, and social and cultural structure is usually associated with higher intellect.

To most evolutionary game theorists, Henrich is know not for how culture shapes brain development but how behavior in games and concepts of fairness vary across cultures. Henrich et al. (2001) studied the behavior of people from 15 small-scale societies in the prototypical test of fairness: the ultimatum game. They showed a great variability in how fairness is conceived and what operationalist results the conceptions produce across the societies they studied.

In general, the ‘universals’ that researchers learnt from studying western university students were not very universal. The groups studied fell into four categories:

• Three foraging societies,
• Six practicing slash-and-burn horticulture,
• Four nomadic herding groups, and
• Three small-scale farming societies.

These add up to sixteen, since the Sangu of Tanzania were split into farmers and herders. In fact, in the full analysis presented in table 1, the authors consider a total of 18 groups; splitting the Hadza of Tanzania into big and small camp, and the villagers of Zimbabwe into unsettled and resettled. Henrich et al. (2001) conclude that neither homoeconomicus nor the western university student (WEIRD; see Henrich, Heine, & Norenzaya (2010) for a definition and discussion) models accurately describe any of these groups. I am not sure why I should trust this result given a complete lack of statistical analysis, small sample size, and what seems like arithmetic mistakes in the table (for instance the resettled villagers rejected 12 out of 86 offers, but the authors list the rate as 7%). However, even without a detailed statistical analysis it is clear that there is a large variance across societies, and at least some of the societies don’t match economically rational behavior or the behavior of WEIRD participants.

The ultimatum game is an interaction between two participants, one is randomly assigned to be Alice and the other is Bob. Alice is given a couple of days wage in money (either the local currency or other common units of exchange like tobacco) and can decide what proportion of it to offer to Bob. She can choose to offer as little or as much as she wants. Bob is then told what proportion Alice offered and can decide to accept or reject. If Bob accepts then the game ends and each party receives their fraction of the goods. If Bob declines then both Alice and Bob receive nothing and the game terminates. The interaction is completely anonymous and happens only once to avoid effects of reputation or direct reciprocity. In this setting, homoeconomicus would give the lowest possible offer if playing as Alice and accept any non-zero offer as Bob (any money is better than no money).

The groups that most closely match the economists’ model are the Machiguenga of Peru, Quichua of Ecuador, and small camp Hadza. They provide the lowest average offers of 26%-27%. They reject offers 5%, 15%, and 28% of the time, respectively. Only the Tsimane of Bolivia (70 interactions), Achuar of Ecuador (16 interactions), and Ache of Paraguay (51 interactions) have zero offer rejection rates. However, members of all three societies offer a sizeable initial offer, averaging 37%, 42%, and 51%, respectively. A particularly surprising group is the Lamelara of Indonesia that offered on average 58% of their goods, and still rejected 3 out of 8 offers (they also generated 4 out of 20 experimenter generated low offers, since no low offers were given by group members). This behavior is drastically different from rational, and not very close to WEIRD participants that tend to offer around 50% and reject offers below 20% about 40% to 60% of the time. If we are to narrow our lens of human behavior to that of weird participants or economic theorizing than it is easy for us to miss the big picture of the drastic variability of behavior across human cultures.

It’s easy to see what we want instead of the truth when we focus too narrowly.

What does this mean for the economic Turing test? We cannot assume that the judge is able to decide how distinguish man from machine without also mistaking people of different cultures for machines. Without very careful selection of games, a judge can only distinguish members of its own culture from members of others. Thus, it is not a test of rationality but of conformation to social norms. I expect this flaw to extend to the traditional Turing test as well. Even if we eliminate the obvious cultural barrier of language by introducing a universal translator, I suspect that there will still be cultural norms that might force the judge to classify members of other cultures as machines. The operationalization of the Turing test has to be carefully studied with how it interacts with different cultures. More importantly, we need to question if a universal definition of intelligence is possible, or if it is inherently dependent on the culture that defines it.

What does this mean for evolutionary game theory? As an evolutionary game theorist, I often take an engineering perspective: pick a departure from objective rationality observed by the psychologists and design a simple model that reproduces this effect. The dependence of game behavior on culture means that I need to introduce a “culture knob” (either as a free or structural parameter) that can be used to tune my model to capture the variance in behavior observed across cultures. This also means that modelers must remain agnostic to the method of inheritance to allow for both genetic and cultural transmission (see Lansing & Cox (2011) for further considerations on how to use EGT when studying culture). Any conclusions or arguments for biological plausibility made from simulations must be examined carefully and compared to existing cross-cultural data. For example, it doesn’t make sense to conclude that fairness is a biologically evolved universal (Nowak, Page, & Sigmund, 2000) if we see such great variance in the concepts of fairness across different cultures of genetically similar humans.

### References

Dunbar, R.I.M. (1998) The social brain hypothesis. Evolutionary Anthropology 6(5): 179-190. [pdf]

Dunbar, R.I.M., & Shultz, S. (2007) Evolution in the Social Brain. Science 317. [pdf]

Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., & McElreath, R. (2001). In Search of Homo Economicus: Behavioral Experiments in 15 Small-Scale Societies American Economic Review, 91 (2), 73-78 DOI: 10.1257/aer.91.2.73

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world. Behavioral and Brain Sciences, 33(2-3), 61-83.

Lansing, J. S., & Cox, M. P. (2011). The Domain of the Replicators. Current Anthropology, 52(1), 105-125.

Nowak, M. A., Page, K. M., & Sigmund, K. (2000). Fairness versus reason in the ultimatum game. Science, 289(5485), 1773-1775.

## Games, culture, and the Turing test (Part I)

Intelligence is one of the most loaded terms that I encounter. A common association is the popular psychometric definition — IQ. For many psychologists, this definition is too restrictive and the g factor is preferred for getting at the ‘core’ of intelligence tests. Even geneticists have latched on to g for looking at heritability of intelligence, and inadvertently helping us see that g might be too general a measure. Still, for some, these tests are not general enough since they miss the emotional aspects of being human, and tests of emotional intelligence have been developed. Unfortunately, the bar for intelligence is a moving one, whether it is the Flynn effect in IQ or more commonly: constant redefinitions of ‘intelligence’.

Does being good at memorizing make one intelligent? Maybe in the 1800s, but not when my laptop can load Google. Does being good at chess make one intelligent? Maybe before Deep Blue beat Kasparov, but not when my laptop can run a chess program that beats grand-masters. Does being good at Jeopardy make one intelligent? Maybe before IBM Watson easily defeated Jennings and Rutter. The common trend here seems to be that as soon as computers outperform humans on a given act, that act and associated skills are no longer considered central to intelligence. As such, if you believe that talking about an intelligent machine is reasonable then you want to agree on an operational benchmark of intelligence that won’t change as you develop your artificial intelligence. Alan Turing did exactly this and launched the field of AI.

I’ve stressed Turing’s greatest achievement as assembling an algorithmic lens and turning it on the world around him, and previously highlighted it’s application to biology. In the popular culture, he is probably best known for the application of the algorithmic lens to the mind — the Turing test (Turing, 1950). The test has three participants: a judge, a human, and a machine. The judge uses an instant messaging program to chat with the human and the machine, without knowing which is which. At the end of a discussion (which can be about anything the judge desires), she has to determine which is man and which is machine. If judges cannot distinguish the machine more than 50% of the time then it is said to pass the test. For Turing, this meant that the machine could “think” and for many AI researchers this is equated with intelligence.

You might have noticed a certain arbitrarity in the chosen mode of communication between judge and candidates. Text based chat seems to be a very general mode, but is general always better? Instead, we could just as easily define a psychometric Turing test by restriction the judge to only give IQ tests. Strannegård and co-authors did this by designing a program that could be tested on the mathematical sequences part of IQ tests (Strannegård, Amirghasemi, & Ulfsbäcker, 2012) and Raven’s progressive matrices (Strannegård, Cirillo, & Ström, 2012). The authors’ anthropomorphic method could match humans on either task (IQ of 100) and on the mathematical sequences greatly outperform most humans if desired (IQ of 140+). In other words, a machine can pass the psychometric Turing test and if IQ is a valid measure of intelligence then your laptop is probably smarter than you.

Of course, there is no reason to stop restricting our mode of communication. A natural continuation is to switch to the domain of game theory. The judge sets a two-player game for the human and computer to play. To decide which player is human, the judge only has access to the history of actions the players chose. This is the economic Turing test suggested by Boris Bukh and shared by Ariel Procaccia. The test can be viewed as part of the program of linking intelligence and rationality.

Procaccia raises the good point that in this game it is not clear if it is more difficult to program the computer or be the judge. Before the work of Tversky & Kahneman (1974), a judge would not even know how to distinguish a human from a rational player. Forty year later, I still don’t know of a reliable survey or meta-analysis of well-controlled experiments of human behavior in the restricted case of one-shot perfect information games. But we do know that judge designed payoffs are not the only source of variation in human strategies and I even suggest the subjective-rationality framework as I way to use evolutionary game theory to study these deviations from objective rationality. Understanding these departures is far from a settled question for psychologists and behavioral economist. In many ways, the programmer in the economic Turing test is a job description for a researcher in computational behavioral economy and the judge is an experimental psychologists. Both tasks are incredibly difficult.

For me, the key limitation of the economic (and similarly, standard) Turing test is not the difficult of judging. The fundamental flaw is the assumption that game behavior is a human universal. Much like the unreasonable assumption of objective rationality, we cannot simply assume uniformity in the heuristics and biases that shape human decision making. Before we take anything as general or universal, we have to show its consistency not only across the participants we chose, but also across different demographics and cultures. Unfortunately, much of game behavior (for instance, the irrational concept of fairness) is not consistent across cultures, even if it has a large consistency within a single culture. What a typical westerner university students considers a reasonable offer in the ultimatum game is not typical for a member of the Hadza group of Tanzania or Lamelara of Indonesia (Henrich et al., 2001). Game behavior is not a human universal, but is highly dependent of culture. We will discuss this dependence in part II of this series, and explore what it means for the Turing test and evolutionary game theory.

Until next time, I leave you with some questions that I wish I knew the answer to: Can we ever define intelligence? Can intelligence be operationalized? Do universal that are central to intelligence exist? Is intelligence a cultural construct? If there are intelligence universals then how should we modify the mode of interface used by the Turing test to focus only on them?

This post continues with a review of Henrich et al. (2001) in Part 2

### References

Henrich, J., Boyd, R., Bowles, S., Camerer, C., Fehr, E., Gintis, H., & McElreath, R. (2001). In Search of Homo Economicus: Behavioral Experiments in 15 Small-Scale Societies. American Economic Review, 91 (2), 73-78

Strannegård, C., Amirghasemi, M., & Ulfsbäcker, S. (2013). An anthropomorphic method for number sequence problems Cognitive Systems Research, 22-23, 27-34 DOI: 10.1016/j.cogsys.2012.05.003

Strannegård, C., Cirillo, S., & Ström, V. (2012). An anthropomorphic method for progressive matrix problems. Cognitive Systems Research.

Turing, A. M. (1950) Computing Machinery and Intelligence. Mind.

Tversky, A.; Kahneman, D. (1974) Judgment under uncertainty: Heuristics and biases. Science 185 (4157): 1124–1131.

## Extra, Special Need for Social Connections

There is now evidence that the payoffs designed by researchers are not the only source of variation in human strategies in two-player symmetric games. In many cases, discrepancies from behavior predicted by variation in payoffs might be explained by social factors such as avoidance of inequality, desire for social harmony, or likability of the opponent.

An interesting example of a study showing the extra, special need for social connections had a woman, in an fMRI scanner playing iterative prisoner’s dilemma against either another woman or a computer (Rilling et al., 2002). Dollar payoffs for the row player are shown in Table 1 for each round. The authors compared BOLD signals across cells, conditions, and brain areas of interest. They found no main effects of the actions of either the scanned or non-scanned player. However, the interaction of these two actions was statistically significant. The comparison indicating the importance of social connections contrasted the sum of the blue and pink cells against the two white cells in Table 1. The Nash equilibrium is in the red cell at mutual defection, as the financial payoff is always better for a defector, whether the other (column) player cooperates or defects. The Pareto optimum, or best overall outcome, is in the blue cell, characterized by mutual cooperation. The BOLD signal was stronger, suggesting more neuronal activity, in these two colored cells with matched actions than in the two white cells with unmatched actions (one player cooperates and the other defects) in several brain areas that have been linked with reward processing: nucleus accumbens, caudate nucleus, ventromedial frontal/orbitofrontal cortex, and rostral anterior cingulate cortex. Because the sums of rewards for this comparison are equal (at $3), this pattern in the BOLD signal cannot be explained by game payoffs per se. $ Cooperate Defect Cooperate $2$ $0$ Defect $3$ $1$

I computed the proportions of decision combinations, against a human partner, and averaged them across four different conditions. Table 2 shows that the two most common decision combinations are CC and DD, and CC (with the highest payoff) being the most common. This preference for identical decisions is mindful of the notion of Hofstadter’s (1985) super-rationality. He assumed that responses to a symmetric game will be the same for all super-rational players. They could find this strategy by maximizing the payoff to each player, on the assumption of sameness (i.e. only playing on the diagonal). The only two cells with identical responses are the colored ones – CC and DD. Because the payoff is higher for CC than for DD, they would both cooperate. If not everyone is super-rational, that could account for variation in the results.

 Frequency Cooperate Defect Cooperate $.47$ $.16$ Defect $.16$ $.20$

Much was also made of higher BOLD signals in the blue cell (mutual cooperation) than in the other three cells, perhaps because that seemed compatible with the authors’ hypothesis that social cooperation was especially rewarding. However, that particular comparison result could alternatively be explained by reward size alone, as the average reward is $2 in the blue cell and$1.33 in the other three cells, or by Hofstadter’s super-rationality. Nonetheless, the former contrast between the colored and white cells does suggest that something is going on which is not compatible with mere dollar payoffs. This conclusion was buttressed by post-game interviews with scanned participants, who often said that mutual cooperation was particularly satisfying, especially when compared with defect-cooperate which was uncomfortable because it provoked guilt over having profited at another’s expense or realization that the outcome would provoke subsequent defection by the exploited other player. Unfortunately, the authors do not present quantitative data on participant satisfaction with each of the four outcomes.

A more general, and very skillful, review of the literature on the neural basis of decision making in symmetric two-player games also supports the idea that designed payoffs may not capture all of the variance (Lee, 2008). Some decisions may be socially motivated. Players may decide what to do on the basis of wanting to enhance or reduce the well-being of their opponent. Such decisions are related to activity in brain circuits implicated in evaluation of rewards.

Among the highlights in Lee’s review:

1. Table 3 shows the row player’s utility function for prisoner’s dilemma adjusted to accommodate aversion to inequality (Fehr & Schmidt, 1999). Parameters $\alpha$ and $\beta$ reflect sensitivity to disadvantageous and advantageous inequality, respectively. When $\beta \geq 1/3$, mutual cooperation and mutual defection both become Nash equilibria (Fehr & Camerer, 2007). More generally, the row player’s utility function can be defined as $U_1 (x) = x_1 - \alpha I_D - \beta I_A$, where $I_D = \max(x_2 - x_1 , 0)$ and $I_A = \max(x_1 - x_2 , 0)$ are inequalities disadvantageous and advantageous to player 1, respectively. It is assumed that $\alpha \geq \beta$ and $1 > \beta \geq 0$. For a given payoff to player 1, $U_1(x)$ is maximal when $x_1 = x_2$, expressing an equality preference. Presumably, some instantiation of this equation could account for the preference for identical decisions in the Rilling et al. study.
2. Aversion to inequality is also evident in other games (e.g., dictator, ultimatum, trust). For example, in the dictator game, a dictator receives an amount of money and can donate some of it to a recipient. Because any donation reduces the payoff to the dictator, the donation amount indexes altruism. Dictators tend to donate about 25% of their money (Camerer, 2003).
3. As predicted by temporal-difference learning algorithms, future rewards are often exponentially discounted during games, assuring that immediate rewards carry more weight than future rewards (Camerer, 2003; Erev & Roth, 1998; Lee, Conroy, McGreevy, & Barraclough, 2004; Lee, McGreevy, & Barraclough, 2005; Sandholm & Crites, 1996). This could be important in games such as iterative prisoner’s dilemma, in deciding whether to defect to gain an immediate reward (if the opponent cooperates) vs. cooperate in order to encourage an opponent to cooperate in future cycles.
4. In some such algorithms, fictive reward signals can be generated by inferences based on a person’s knowledge, and these have been shown to influence financial decision making (Lohrenz, McCabe, Camerer, & Montague, 2007).
5. Game payoffs can be estimated by a player mentally simulating hypothetical interactions (Camerer, 2003).
6. In games, reputation and moral character of an opponent influence a player’s tendency to cooperate with that opponent (Delgado, Frank, & Phelps, 2005; Nowak & Sigmund, 1998; Wedekind & Milinski, 2000).
7. In primates, midbrain dopamine neurons encode reward prediction errors (Schultz, 2006; Schultz, Dayan, & Montague, 1997) and decrease their activity when expected reward does not occur (Bayer & Glimcher, 2005; Roesch, Calu, & Schoenbaum, 2007; Schultz, et al., 1997). The following brain areas modulate their activity according to reward variation: amygdala, basal ganglia, posterior parietal cortex, lateral prefrontal cortex, medial frontal cortex, orbitofrontal cortex, striatum, insula. Imaging studies show that many of these brain areas are also active during social decision making (Knutson & Cooper, 2005; O’Doherty, 2004).
8. Some of the variability in social decisions may be due to genetic differences. For example, the minimum acceptable offer in ultimatum games is more similar between MZ than DZ twins (Wallace, Cesarini, Lichtenstein, & Johannesson, 2007). The ultimatum game is similar to the dictator game because a proposer offers some of the money to a recipient, who then can accept or reject the offer. If the offer is rejected, then neither player receives money. The mean offer in ultimatum games is about 40%, suggesting that proposers are motivated to avoid rejection.
9. Hormones can influence decisions in social games. For example, oxytocin increases the amount of money invested during trust games (Kosfeld, Heinrichs, Zak, Fischbacher, & Fehr, 2005). In the trust game, an investor invests a proportion of her own money. This money then is multiplied, and then transferred to a trustee. The trustee then decides how much of this transferred money is returned to the investor. The amount invested by the investor measures the trust of the investor in the trustee, and the amount of repayment reflects the trustee’s trustworthiness.
 \$ Cooperate Defect Cooperate $2$ $-3\alpha$ Defect $3 - 3\beta$ $1$

In conclusion, evidence is accumulating that objective payoffs designed for two-person, symmetric games may not explain all of the observed psychological phenomena. The door is now open for more serious investigation of social factors. Particularly in humans, and to a lesser extent in other primates, a player’s perceived payoffs may differ from the designed payoffs. Such payoff discrepancies could merit further exploration.

In terms of methodology, two ideas from these two papers stand out. One is the revised utility equation that supports inequality aversion (point 1 in the list of highlights from Lee’s review). The other is the simple technique of asking participants to evaluate the various decision combinations after Rilling et al.’s prisoner’s dilemma games. The utility equation could be helpful in modeling, and participant evaluation would be a cheap way to assess people’s evaluation of game outcomes.

Taken together, the material reviewed here could be very relevant to our planned research on objective and subjective rationality. Artem’s blog post on the possible discrepancy between objective and subjective rationality postulated that the objective game payoffs used by evolution may differ from human participants’ perceived reward values. Also relevant is Marcel’s post on habitual selfish agents and rationality, in which he points out that apparent irrationality is in game playing could result from rational processes being applied to subjectively perceived rewards that differ from reality. The evidence reviewed here (from Rilling et al., 2002 and Lee, 2008) identifies particular ways in which humans’ perceived reward values differ from experimenter-designed game payoffs. Inequality avoidance could be a key strategy with which to start exploring these phenomena, particularly in view of the payoff tweaks discussed in point 1 and Table 3.

### References

Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141. [NIH html]

Camerer, C. F. (2003). Behavioral game theory: Experiments in strategic interaction. Princeton, NJ: Princeton University Press.

Delgado, M. R., Frank, R. H., & Phelps, E. A. (2005). Perceptions of moral character modulate the neural systems of reward during the trust game. Nature Neuroscience, 8, 1611-1618. [pdf]

Erev, I., & Roth, A. E. (1998). Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. The American Economic Review 88, 848-881. [pdf]

Fehr, E., & Camerer, C. F. (2007). Social neuroeconomics: the neural circuitry of social preferences. Trends in Cognitive Sciences, 11, 419-427. [pdf]

Fehr, E., & Schmidt, K. M. (1999). A theory of fairness, competition, and cooperation. The Quarterly Journal of Economics, 114, 817-868. [pdf]

Hofstadter, D. R. (1985). Metamagical themas. New York: Basic Books.

Knutson, B., & Cooper, J. C. (2005). Functional magnetic resonance imaging of reward prediction. Current Opinion in Neurology, 18, 411-417. [pdf]

Kosfeld, M., Heinrichs, M., Zak, P., Fischbacher, U., & Fehr, E. (2005). Oxytocin increases trust in humans. Nature, 435, 673-676. [pdf]

Lee, D. (2008). Game theory and neural basis of social decision making Nature Neuroscience, 11 (4), 404-409 DOI: 10.1038/nn2065 [NIH html]

Lee, D., Conroy, M. L., McGreevy, B. P., & Barraclough, D. J. (2004). Reinforcement learning and decision making in monkeys during a competitive game. Brain Research: Cognitive Brain Research, 22, 45-58. [link]

Lee, D., McGreevy, B. P., & Barraclough, D. J. (2005). Learning and decision making in monkeys during a rock-paper-scissors game. Brain Research: Cognitive Brain Research, 25, 416-430. [link]

Lohrenz, T., McCabe, K., Camerer, C. F., & Montague, P. R. (2007). Neural signature of fictive learning signals in a sequential investment task. Proceedings National Academy of Sciences U.S.A., 104, 9493-9498. [link]

Nowak, M. A., & Sigmund, K. (1998). Evolution of indirect reciprocity by image scoring. Nature, 393, 573-577. [pdf]

O’Doherty, J. P. (2004). Reward representation and reward-related learning in the human brain: insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776. [pdf]

Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G., S., & Kilts, C. D. (2002). A neural basis for social cooperation. Neuron, 35, 395-405. [pdf]

Roesch, M. R., Calu, D. J., & Schoenbaum, G. (2007). Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience, 10, 1615-1624. [NIH html]

Sandholm, T. W., & Crites, R. H. (1996). Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems, 37, 147-166. [pdf]

Schultz, W. (2006). Behavioral theories and the neurophysiology of reward. Annual Review of Psychology, 57, 87-115. [pdf]

Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science 275, 1593-1599. [pdf]

Wallace, B., Cesarini, D., Lichtenstein, P., & Johannesson, M. (2007). Heritability of ultimatum game responder behavior. Proceedings of the National Academy of Sciences USA, 104, 15631-15634. [link]

Wedekind, C., & Milinski, M. (2000). Cooperation through image scoring in humans. Science, 228, 850-852. [link]