There is now evidence that the payoffs designed by researchers are not the only source of variation in human strategies in two-player symmetric games. In many cases, discrepancies from behavior predicted by variation in payoffs might be explained by social factors such as avoidance of inequality, desire for social harmony, or likability of the opponent.
An interesting example of a study showing the extra, special need for social connections had a woman, in an fMRI scanner playing iterative prisoner’s dilemma against either another woman or a computer (Rilling et al., 2002). Dollar payoffs for the row player are shown in Table 1 for each round. The authors compared BOLD signals across cells, conditions, and brain areas of interest. They found no main effects of the actions of either the scanned or non-scanned player. However, the interaction of these two actions was statistically significant. The comparison indicating the importance of social connections contrasted the sum of the blue and pink cells against the two white cells in Table 1. The Nash equilibrium is in the red cell at mutual defection, as the financial payoff is always better for a defector, whether the other (column) player cooperates or defects. The Pareto optimum, or best overall outcome, is in the blue cell, characterized by mutual cooperation. The BOLD signal was stronger, suggesting more neuronal activity, in these two colored cells with matched actions than in the two white cells with unmatched actions (one player cooperates and the other defects) in several brain areas that have been linked with reward processing: nucleus accumbens, caudate nucleus, ventromedial frontal/orbitofrontal cortex, and rostral anterior cingulate cortex. Because the sums of rewards for this comparison are equal (at $3), this pattern in the BOLD signal cannot be explained by game payoffs per se.
Table 1: Dollar Payoffs in Prisoner’s Dilemma Game
$ |
Cooperate |
Defect |
Cooperate |
 |
 |
Defect |
 |
 |
I computed the proportions of decision combinations, against a human partner, and averaged them across four different conditions. Table 2 shows that the two most common decision combinations are CC and DD, and CC (with the highest payoff) being the most common. This preference for identical decisions is mindful of the notion of Hofstadter’s (1985) super-rationality. He assumed that responses to a symmetric game will be the same for all super-rational players. They could find this strategy by maximizing the payoff to each player, on the assumption of sameness (i.e. only playing on the diagonal). The only two cells with identical responses are the colored ones – CC and DD. Because the payoff is higher for CC than for DD, they would both cooperate. If not everyone is super-rational, that could account for variation in the results.
Table 2: Average Frequency of Decision Combinations
Frequency |
Cooperate |
Defect |
Cooperate |
 |
 |
Defect |
 |
 |
Much was also made of higher BOLD signals in the blue cell (mutual cooperation) than in the other three cells, perhaps because that seemed compatible with the authors’ hypothesis that social cooperation was especially rewarding. However, that particular comparison result could alternatively be explained by reward size alone, as the average reward is $2 in the blue cell and $1.33 in the other three cells, or by Hofstadter’s super-rationality. Nonetheless, the former contrast between the colored and white cells does suggest that something is going on which is not compatible with mere dollar payoffs. This conclusion was buttressed by post-game interviews with scanned participants, who often said that mutual cooperation was particularly satisfying, especially when compared with defect-cooperate which was uncomfortable because it provoked guilt over having profited at another’s expense or realization that the outcome would provoke subsequent defection by the exploited other player. Unfortunately, the authors do not present quantitative data on participant satisfaction with each of the four outcomes.
A more general, and very skillful, review of the literature on the neural basis of decision making in symmetric two-player games also supports the idea that designed payoffs may not capture all of the variance (Lee, 2008). Some decisions may be socially motivated. Players may decide what to do on the basis of wanting to enhance or reduce the well-being of their opponent. Such decisions are related to activity in brain circuits implicated in evaluation of rewards.
Among the highlights in Lee’s review:
- Table 3 shows the row player’s utility function for prisoner’s dilemma adjusted to accommodate aversion to inequality (Fehr & Schmidt, 1999). Parameters
and
reflect sensitivity to disadvantageous and advantageous inequality, respectively. When
, mutual cooperation and mutual defection both become Nash equilibria (Fehr & Camerer, 2007). More generally, the row player’s utility function can be defined as
, where
and
are inequalities disadvantageous and advantageous to player 1, respectively. It is assumed that
and
. For a given payoff to player 1,
is maximal when
, expressing an equality preference. Presumably, some instantiation of this equation could account for the preference for identical decisions in the Rilling et al. study.
- Aversion to inequality is also evident in other games (e.g., dictator, ultimatum, trust). For example, in the dictator game, a dictator receives an amount of money and can donate some of it to a recipient. Because any donation reduces the payoff to the dictator, the donation amount indexes altruism. Dictators tend to donate about 25% of their money (Camerer, 2003).
- As predicted by temporal-difference learning algorithms, future rewards are often exponentially discounted during games, assuring that immediate rewards carry more weight than future rewards (Camerer, 2003; Erev & Roth, 1998; Lee, Conroy, McGreevy, & Barraclough, 2004; Lee, McGreevy, & Barraclough, 2005; Sandholm & Crites, 1996). This could be important in games such as iterative prisoner’s dilemma, in deciding whether to defect to gain an immediate reward (if the opponent cooperates) vs. cooperate in order to encourage an opponent to cooperate in future cycles.
- In some such algorithms, fictive reward signals can be generated by inferences based on a person’s knowledge, and these have been shown to influence financial decision making (Lohrenz, McCabe, Camerer, & Montague, 2007).
- Game payoffs can be estimated by a player mentally simulating hypothetical interactions (Camerer, 2003).
- In games, reputation and moral character of an opponent influence a player’s tendency to cooperate with that opponent (Delgado, Frank, & Phelps, 2005; Nowak & Sigmund, 1998; Wedekind & Milinski, 2000).
- In primates, midbrain dopamine neurons encode reward prediction errors (Schultz, 2006; Schultz, Dayan, & Montague, 1997) and decrease their activity when expected reward does not occur (Bayer & Glimcher, 2005; Roesch, Calu, & Schoenbaum, 2007; Schultz, et al., 1997). The following brain areas modulate their activity according to reward variation: amygdala, basal ganglia, posterior parietal cortex, lateral prefrontal cortex, medial frontal cortex, orbitofrontal cortex, striatum, insula. Imaging studies show that many of these brain areas are also active during social decision making (Knutson & Cooper, 2005; O’Doherty, 2004).
- Some of the variability in social decisions may be due to genetic differences. For example, the minimum acceptable offer in ultimatum games is more similar between MZ than DZ twins (Wallace, Cesarini, Lichtenstein, & Johannesson, 2007). The ultimatum game is similar to the dictator game because a proposer offers some of the money to a recipient, who then can accept or reject the offer. If the offer is rejected, then neither player receives money. The mean offer in ultimatum games is about 40%, suggesting that proposers are motivated to avoid rejection.
- Hormones can influence decisions in social games. For example, oxytocin increases the amount of money invested during trust games (Kosfeld, Heinrichs, Zak, Fischbacher, & Fehr, 2005). In the trust game, an investor invests a proportion of her own money. This money then is multiplied, and then transferred to a trustee. The trustee then decides how much of this transferred money is returned to the investor. The amount invested by the investor measures the trust of the investor in the trustee, and the amount of repayment reflects the trustee’s trustworthiness.
Table 3: Payoffs for Prisoner’s Dilemma, Adjusted for Inequality Aversion (adapted from Fehr & Camerer, 2007)
$ |
Cooperate |
Defect |
Cooperate |
 |
 |
Defect |
 |
 |
In conclusion, evidence is accumulating that objective payoffs designed for two-person, symmetric games may not explain all of the observed psychological phenomena. The door is now open for more serious investigation of social factors. Particularly in humans, and to a lesser extent in other primates, a player’s perceived payoffs may differ from the designed payoffs. Such payoff discrepancies could merit further exploration.
In terms of methodology, two ideas from these two papers stand out. One is the revised utility equation that supports inequality aversion (point 1 in the list of highlights from Lee’s review). The other is the simple technique of asking participants to evaluate the various decision combinations after Rilling et al.’s prisoner’s dilemma games. The utility equation could be helpful in modeling, and participant evaluation would be a cheap way to assess people’s evaluation of game outcomes.
Taken together, the material reviewed here could be very relevant to our planned research on objective and subjective rationality. Artem’s blog post on the possible discrepancy between objective and subjective rationality postulated that the objective game payoffs used by evolution may differ from human participants’ perceived reward values. Also relevant is Marcel’s post on habitual selfish agents and rationality, in which he points out that apparent irrationality is in game playing could result from rational processes being applied to subjectively perceived rewards that differ from reality. The evidence reviewed here (from Rilling et al., 2002 and Lee, 2008) identifies particular ways in which humans’ perceived reward values differ from experimenter-designed game payoffs. Inequality avoidance could be a key strategy with which to start exploring these phenomena, particularly in view of the payoff tweaks discussed in point 1 and Table 3.
References
Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141. [NIH html]
Camerer, C. F. (2003). Behavioral game theory: Experiments in strategic interaction. Princeton, NJ: Princeton University Press.
Delgado, M. R., Frank, R. H., & Phelps, E. A. (2005). Perceptions of moral character modulate the neural systems of reward during the trust game. Nature Neuroscience, 8, 1611-1618. [pdf]
Erev, I., & Roth, A. E. (1998). Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. The American Economic Review 88, 848-881. [pdf]
Fehr, E., & Camerer, C. F. (2007). Social neuroeconomics: the neural circuitry of social preferences. Trends in Cognitive Sciences, 11, 419-427. [pdf]
Fehr, E., & Schmidt, K. M. (1999). A theory of fairness, competition, and cooperation. The Quarterly Journal of Economics, 114, 817-868. [pdf]
Hofstadter, D. R. (1985). Metamagical themas. New York: Basic Books.
Knutson, B., & Cooper, J. C. (2005). Functional magnetic resonance imaging of reward prediction. Current Opinion in Neurology, 18, 411-417. [pdf]
Kosfeld, M., Heinrichs, M., Zak, P., Fischbacher, U., & Fehr, E. (2005). Oxytocin increases trust in humans. Nature, 435, 673-676. [pdf]
Lee, D. (2008). Game theory and neural basis of social decision making Nature Neuroscience, 11 (4), 404-409 DOI: 10.1038/nn2065 [NIH html]
Lee, D., Conroy, M. L., McGreevy, B. P., & Barraclough, D. J. (2004). Reinforcement learning and decision making in monkeys during a competitive game. Brain Research: Cognitive Brain Research, 22, 45-58. [link]
Lee, D., McGreevy, B. P., & Barraclough, D. J. (2005). Learning and decision making in monkeys during a rock-paper-scissors game. Brain Research: Cognitive Brain Research, 25, 416-430. [link]
Lohrenz, T., McCabe, K., Camerer, C. F., & Montague, P. R. (2007). Neural signature of fictive learning signals in a sequential investment task. Proceedings National Academy of Sciences U.S.A., 104, 9493-9498. [link]
Nowak, M. A., & Sigmund, K. (1998). Evolution of indirect reciprocity by image scoring. Nature, 393, 573-577. [pdf]
O’Doherty, J. P. (2004). Reward representation and reward-related learning in the human brain: insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776. [pdf]
Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G., S., & Kilts, C. D. (2002). A neural basis for social cooperation. Neuron, 35, 395-405. [pdf]
Roesch, M. R., Calu, D. J., & Schoenbaum, G. (2007). Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience, 10, 1615-1624. [NIH html]
Sandholm, T. W., & Crites, R. H. (1996). Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems, 37, 147-166. [pdf]
Schultz, W. (2006). Behavioral theories and the neurophysiology of reward. Annual Review of Psychology, 57, 87-115. [pdf]
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science 275, 1593-1599. [pdf]
Wallace, B., Cesarini, D., Lichtenstein, P., & Johannesson, M. (2007). Heritability of ultimatum game responder behavior. Proceedings of the National Academy of Sciences USA, 104, 15631-15634. [link]
Wedekind, C., & Milinski, M. (2000). Cooperation through image scoring in humans. Science, 228, 850-852. [link]