Enriching evolutionary games with trust and trustworthiness

Fairly early in my course on Computational Psychology, I like to discuss Box’s (1979) famous aphorism about models: “All models are wrong, but some are useful.” Although Box was referring to statistical models, his comment on truth and utility applies equally well to computational models attempting to simulate complex empirical phenomena. I want my students to appreciate this disclaimer from the start because it avoids endless debate about whether a model is true. Once we agree to focus on utility, we can take a more relaxed and objective view of modeling, with appropriate humility in discussing our own models. Historical consideration of models, and theories as well, should provide a strong clue that replacement by better and more useful models (or theories) is inevitable, and indeed is a standard way for science to progress. In the rapid turnover of computational modeling, this means that the best one could hope for is to have the best (most useful) model for a while, before it is pushed aside or incorporated by a more comprehensive and often more abstract model. In his recent post on three types of mathematical models, Artem characterized such models as heuristic. It is worth adding that the most useful models are often those that best cover (simulate) the empirical phenomena of interest, bringing a model closer to what Artem called insilications.
Read more of this post

Ethnocentrism, religion, and austerity: a science poster for the humanities

Artem Kaznatcheev and I presented a poster on May 4th at the University of British Columbia to a highly interdisciplinary conference on religion. The conference acronym is CERC, which translates as Cultural Evolution of Religion Research Consortium. Most of the 60-some attendees are religion scholars and social scientists from North American and European universities. Many are also participants in a large partnership grant from the Social Sciences and Humanities Research Council of Canada (SSHRC), spearheaded by Ted Slingerland, an East Asian scholar at UBC. Some preliminary conversations with attendees indicated considerable apprehension about how researchers from the humanities and sciences would get on. Many of us are familiar with collaborative difficulties even in our own narrow domains. Skepticism was fairly common.

As far as I know, our poster was the only computer simulation presented at the meeting. We titled it Agent-based modeling of the evolution of “religion”, with scare quotes around religion because of the superficial and off-hand way we treated it. Because we know from experience that simulations can be a tough sell even at a scientific psychology conference, we were curious about whether and how this poster would fly in this broader meeting.
Read more of this post

Extra, Special Need for Social Connections

There is now evidence that the payoffs designed by researchers are not the only source of variation in human strategies in two-player symmetric games. In many cases, discrepancies from behavior predicted by variation in payoffs might be explained by social factors such as avoidance of inequality, desire for social harmony, or likability of the opponent.

An interesting example of a study showing the extra, special need for social connections had a woman, in an fMRI scanner playing iterative prisoner’s dilemma against either another woman or a computer (Rilling et al., 2002). Dollar payoffs for the row player are shown in Table 1 for each round. The authors compared BOLD signals across cells, conditions, and brain areas of interest. They found no main effects of the actions of either the scanned or non-scanned player. However, the interaction of these two actions was statistically significant. The comparison indicating the importance of social connections contrasted the sum of the blue and pink cells against the two white cells in Table 1. The Nash equilibrium is in the red cell at mutual defection, as the financial payoff is always better for a defector, whether the other (column) player cooperates or defects. The Pareto optimum, or best overall outcome, is in the blue cell, characterized by mutual cooperation. The BOLD signal was stronger, suggesting more neuronal activity, in these two colored cells with matched actions than in the two white cells with unmatched actions (one player cooperates and the other defects) in several brain areas that have been linked with reward processing: nucleus accumbens, caudate nucleus, ventromedial frontal/orbitofrontal cortex, and rostral anterior cingulate cortex. Because the sums of rewards for this comparison are equal (at $3), this pattern in the BOLD signal cannot be explained by game payoffs per se.

Table 1: Dollar Payoffs in Prisoner’s Dilemma Game
$ Cooperate Defect
Cooperate 2 0
Defect 3 1

I computed the proportions of decision combinations, against a human partner, and averaged them across four different conditions. Table 2 shows that the two most common decision combinations are CC and DD, and CC (with the highest payoff) being the most common. This preference for identical decisions is mindful of the notion of Hofstadter’s (1985) super-rationality. He assumed that responses to a symmetric game will be the same for all super-rational players. They could find this strategy by maximizing the payoff to each player, on the assumption of sameness (i.e. only playing on the diagonal). The only two cells with identical responses are the colored ones – CC and DD. Because the payoff is higher for CC than for DD, they would both cooperate. If not everyone is super-rational, that could account for variation in the results.

Table 2: Average Frequency of Decision Combinations
Frequency Cooperate Defect
Cooperate .47 .16
Defect .16 .20

Much was also made of higher BOLD signals in the blue cell (mutual cooperation) than in the other three cells, perhaps because that seemed compatible with the authors’ hypothesis that social cooperation was especially rewarding. However, that particular comparison result could alternatively be explained by reward size alone, as the average reward is $2 in the blue cell and $1.33 in the other three cells, or by Hofstadter’s super-rationality. Nonetheless, the former contrast between the colored and white cells does suggest that something is going on which is not compatible with mere dollar payoffs. This conclusion was buttressed by post-game interviews with scanned participants, who often said that mutual cooperation was particularly satisfying, especially when compared with defect-cooperate which was uncomfortable because it provoked guilt over having profited at another’s expense or realization that the outcome would provoke subsequent defection by the exploited other player. Unfortunately, the authors do not present quantitative data on participant satisfaction with each of the four outcomes.

A more general, and very skillful, review of the literature on the neural basis of decision making in symmetric two-player games also supports the idea that designed payoffs may not capture all of the variance (Lee, 2008). Some decisions may be socially motivated. Players may decide what to do on the basis of wanting to enhance or reduce the well-being of their opponent. Such decisions are related to activity in brain circuits implicated in evaluation of rewards.

Among the highlights in Lee’s review:

  1. Table 3 shows the row player’s utility function for prisoner’s dilemma adjusted to accommodate aversion to inequality (Fehr & Schmidt, 1999). Parameters \alpha and \beta reflect sensitivity to disadvantageous and advantageous inequality, respectively. When \beta \geq 1/3, mutual cooperation and mutual defection both become Nash equilibria (Fehr & Camerer, 2007). More generally, the row player’s utility function can be defined as U_1 (x) = x_1 - \alpha I_D - \beta I_A, where I_D = \max(x_2 - x_1 , 0) and I_A = \max(x_1 - x_2 , 0) are inequalities disadvantageous and advantageous to player 1, respectively. It is assumed that \alpha \geq \beta and 1 > \beta \geq 0. For a given payoff to player 1, U_1(x) is maximal when x_1 = x_2, expressing an equality preference. Presumably, some instantiation of this equation could account for the preference for identical decisions in the Rilling et al. study.
  2. Aversion to inequality is also evident in other games (e.g., dictator, ultimatum, trust). For example, in the dictator game, a dictator receives an amount of money and can donate some of it to a recipient. Because any donation reduces the payoff to the dictator, the donation amount indexes altruism. Dictators tend to donate about 25% of their money (Camerer, 2003).
  3. As predicted by temporal-difference learning algorithms, future rewards are often exponentially discounted during games, assuring that immediate rewards carry more weight than future rewards (Camerer, 2003; Erev & Roth, 1998; Lee, Conroy, McGreevy, & Barraclough, 2004; Lee, McGreevy, & Barraclough, 2005; Sandholm & Crites, 1996). This could be important in games such as iterative prisoner’s dilemma, in deciding whether to defect to gain an immediate reward (if the opponent cooperates) vs. cooperate in order to encourage an opponent to cooperate in future cycles.
  4. In some such algorithms, fictive reward signals can be generated by inferences based on a person’s knowledge, and these have been shown to influence financial decision making (Lohrenz, McCabe, Camerer, & Montague, 2007).
  5. Game payoffs can be estimated by a player mentally simulating hypothetical interactions (Camerer, 2003).
  6. In games, reputation and moral character of an opponent influence a player’s tendency to cooperate with that opponent (Delgado, Frank, & Phelps, 2005; Nowak & Sigmund, 1998; Wedekind & Milinski, 2000).
  7. In primates, midbrain dopamine neurons encode reward prediction errors (Schultz, 2006; Schultz, Dayan, & Montague, 1997) and decrease their activity when expected reward does not occur (Bayer & Glimcher, 2005; Roesch, Calu, & Schoenbaum, 2007; Schultz, et al., 1997). The following brain areas modulate their activity according to reward variation: amygdala, basal ganglia, posterior parietal cortex, lateral prefrontal cortex, medial frontal cortex, orbitofrontal cortex, striatum, insula. Imaging studies show that many of these brain areas are also active during social decision making (Knutson & Cooper, 2005; O’Doherty, 2004).
  8. Some of the variability in social decisions may be due to genetic differences. For example, the minimum acceptable offer in ultimatum games is more similar between MZ than DZ twins (Wallace, Cesarini, Lichtenstein, & Johannesson, 2007). The ultimatum game is similar to the dictator game because a proposer offers some of the money to a recipient, who then can accept or reject the offer. If the offer is rejected, then neither player receives money. The mean offer in ultimatum games is about 40%, suggesting that proposers are motivated to avoid rejection.
  9. Hormones can influence decisions in social games. For example, oxytocin increases the amount of money invested during trust games (Kosfeld, Heinrichs, Zak, Fischbacher, & Fehr, 2005). In the trust game, an investor invests a proportion of her own money. This money then is multiplied, and then transferred to a trustee. The trustee then decides how much of this transferred money is returned to the investor. The amount invested by the investor measures the trust of the investor in the trustee, and the amount of repayment reflects the trustee’s trustworthiness.
Table 3: Payoffs for Prisoner’s Dilemma, Adjusted for Inequality Aversion (adapted from Fehr & Camerer, 2007)
$ Cooperate Defect
Cooperate 2 -3\alpha
Defect 3 - 3\beta 1

In conclusion, evidence is accumulating that objective payoffs designed for two-person, symmetric games may not explain all of the observed psychological phenomena. The door is now open for more serious investigation of social factors. Particularly in humans, and to a lesser extent in other primates, a player’s perceived payoffs may differ from the designed payoffs. Such payoff discrepancies could merit further exploration.

In terms of methodology, two ideas from these two papers stand out. One is the revised utility equation that supports inequality aversion (point 1 in the list of highlights from Lee’s review). The other is the simple technique of asking participants to evaluate the various decision combinations after Rilling et al.’s prisoner’s dilemma games. The utility equation could be helpful in modeling, and participant evaluation would be a cheap way to assess people’s evaluation of game outcomes.

Taken together, the material reviewed here could be very relevant to our planned research on objective and subjective rationality. Artem’s blog post on the possible discrepancy between objective and subjective rationality postulated that the objective game payoffs used by evolution may differ from human participants’ perceived reward values. Also relevant is Marcel’s post on habitual selfish agents and rationality, in which he points out that apparent irrationality is in game playing could result from rational processes being applied to subjectively perceived rewards that differ from reality. The evidence reviewed here (from Rilling et al., 2002 and Lee, 2008) identifies particular ways in which humans’ perceived reward values differ from experimenter-designed game payoffs. Inequality avoidance could be a key strategy with which to start exploring these phenomena, particularly in view of the payoff tweaks discussed in point 1 and Table 3.

References

Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47, 129-141. [NIH html]

Camerer, C. F. (2003). Behavioral game theory: Experiments in strategic interaction. Princeton, NJ: Princeton University Press.

Delgado, M. R., Frank, R. H., & Phelps, E. A. (2005). Perceptions of moral character modulate the neural systems of reward during the trust game. Nature Neuroscience, 8, 1611-1618. [pdf]

Erev, I., & Roth, A. E. (1998). Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. The American Economic Review 88, 848-881. [pdf]

Fehr, E., & Camerer, C. F. (2007). Social neuroeconomics: the neural circuitry of social preferences. Trends in Cognitive Sciences, 11, 419-427. [pdf]

Fehr, E., & Schmidt, K. M. (1999). A theory of fairness, competition, and cooperation. The Quarterly Journal of Economics, 114, 817-868. [pdf]

Hofstadter, D. R. (1985). Metamagical themas. New York: Basic Books.

Knutson, B., & Cooper, J. C. (2005). Functional magnetic resonance imaging of reward prediction. Current Opinion in Neurology, 18, 411-417. [pdf]

Kosfeld, M., Heinrichs, M., Zak, P., Fischbacher, U., & Fehr, E. (2005). Oxytocin increases trust in humans. Nature, 435, 673-676. [pdf]

Lee, D. (2008). Game theory and neural basis of social decision making Nature Neuroscience, 11 (4), 404-409 DOI: 10.1038/nn2065 [NIH html]

Lee, D., Conroy, M. L., McGreevy, B. P., & Barraclough, D. J. (2004). Reinforcement learning and decision making in monkeys during a competitive game. Brain Research: Cognitive Brain Research, 22, 45-58. [link]

Lee, D., McGreevy, B. P., & Barraclough, D. J. (2005). Learning and decision making in monkeys during a rock-paper-scissors game. Brain Research: Cognitive Brain Research, 25, 416-430. [link]

Lohrenz, T., McCabe, K., Camerer, C. F., & Montague, P. R. (2007). Neural signature of fictive learning signals in a sequential investment task. Proceedings National Academy of Sciences U.S.A., 104, 9493-9498. [link]

Nowak, M. A., & Sigmund, K. (1998). Evolution of indirect reciprocity by image scoring. Nature, 393, 573-577. [pdf]

O’Doherty, J. P. (2004). Reward representation and reward-related learning in the human brain: insights from neuroimaging. Current Opinion in Neurobiology, 14, 769-776. [pdf]

Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G., S., & Kilts, C. D. (2002). A neural basis for social cooperation. Neuron, 35, 395-405. [pdf]

Roesch, M. R., Calu, D. J., & Schoenbaum, G. (2007). Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience, 10, 1615-1624. [NIH html]

Sandholm, T. W., & Crites, R. H. (1996). Multiagent reinforcement learning in the iterated prisoner’s dilemma. Biosystems, 37, 147-166. [pdf]

Schultz, W. (2006). Behavioral theories and the neurophysiology of reward. Annual Review of Psychology, 57, 87-115. [pdf]

Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science 275, 1593-1599. [pdf]

Wallace, B., Cesarini, D., Lichtenstein, P., & Johannesson, M. (2007). Heritability of ultimatum game responder behavior. Proceedings of the National Academy of Sciences USA, 104, 15631-15634. [link]

Wedekind, C., & Milinski, M. (2000). Cooperation through image scoring in humans. Science, 228, 850-852. [link]

Fewer Friends, More Cooperation

Cooperation is fundamental to all social and biological systems. If cells did not cooperate, multi-cellular organisms would never have evolved [1]. If people did not cooperate, there would be no nation states [2]. But this wide-scale cooperation is somewhat of a mystery from the perspective of Darwinian evolution, which would seem to favor competition for scarce resources and reproductive success over cooperation. Cooperation incurs a cost to provide a benefit elsewhere. Indeed, a basic finding in agent-based computer simulations with unstructured populations is that evolution favors defection over cooperation [3]. As a result, there has been intensive research into spatially structured populations, attempting to explain the pervasive cooperation seen in nature. Under certain realistic spatial conditions, an agent is more likely to encounter members of its own gene pool than would be expected by chance; and this allows cooperation to evolve [4].

A particularly important study in this tradition is that of Ohtsuki and colleagues [5] at Harvard’s productive Program for Evolutionary Dynamics. Their complex mathematical derivations and computer simulations conveniently conform to a rather simple rule: that evolution favors cooperation if the benefit of receiving cooperation b divided by the cost c of giving it exceeds the average number of neighbors k in the population. Or, b/c > k, exactly parallel to Hamilton’s famous rule where the b/c ratio had to exceed relatedness r in order for cooperation to thrive [6].

Ohtsuki et al.’s simulations find that this rule holds in a wide variety of graph structures: lattices, cycles, random regular graphs, random graphs, and scale-free networks. Square lattices involve either von Neumann (k = 4) or Moore (k = 8) neighbors. Cycles involve a circular arrangement of agents where k = 2. In random regular graphs, the links between agents are random except that every agent has an equal number of links (k). Random graphs are similar except that agents have an average of k links, rather than exactly k. Scale-free networks are hub-like graphs generated according to the method of preferential attachment: an agent links with others in proportion to the other’s connectivity [7].

Similar results obtain for two different reproductive schemes: death-birth and imitation. For death-birth updating, in each time step a random individual is chosen to die, and its neighbors compete for the empty site in proportion to their fitness. At each cycle of imitation updating, a random agent keeps its own strategy or imitates a neighbor’s strategy proportional to their fitness. In all cases, fitness is determined by the outcome of each agent’s interactions with its neighbors.

Death-birth updating: Image from Ohtsuki et al. 2006

In this graph [5], a blue co-operator competes with a red defector for newly-freed location via death-birth updating. The co-operating candidate’s fitness is 2b-4c because it receives cooperation from two neighboring co-operators and gives cooperation to four neighbors. The defecting candidate’s fitness is b because it receives cooperation from one co-operator and gives no cooperation.

For imitation updating, cooperation evolves as long as b/c > k + 2, the plus 2 because each agent is effectively its own neighbor.

The last sentence of the Ohtsuki paper [5] nicely summarizes the results: “The fewer friends I have, the more strongly my fate is bound to theirs.” This derives from the effect of k, which varies from 2-10. As k decreases, it is exceeded by smaller values of the b/c ratio.

In a very crowded literature, this paper is particularly notable for including both simulations and mathematical analysis; in effect, the simulation results provide empirical confirmation of the mathematics. Typical studies use only one of these methods, leaving readers to wonder whether the results would generalize to parameter settings other than those used in simulations, or whether the mathematical analysis was done properly or can predict empirical simulation results. The paper is also notable for including a fairly wide variety of graph structures. More typically, one sees results for only one particular graph, most often a square lattice. In all of these ways, the Ohtsuki et al. paper serves as inspiration for future theoretical work on evolution.

References

1.           Axelrod, R. and W.D. Hamilton, The evolution of cooperation. Science, 1981. 211: p. 1390-1396.

2.           Wedekind, C. and M. Milinski, Cooperation through image scoring in humans. Science, 2000. 228: p. 850-852.

3.           Nowak, M.A., Evolutionary dynamics2006, Cambridge, MA: Harvard University Press.

4.           Lieberman, E., C. Hauert, and M.A. Nowak, Evolutionary dynamics on graphs. Nature, 2005. 433: p. 312-316.

5.           Ohtsuki, H., Hauert, C., Lieberman, E., & Nowak, M. (2006). A simple rule for the evolution of cooperation on graphs and social networks Nature, 441 (7092), 502-505 DOI: 10.1038/nature04605

6.           Hamilton, W.D., The genetical evolution of social behaviour, I. Journal of Theoretical Biology, 1964. 7: p. 1-16.

7.           Santos, F.C. and J.M. Pacheco, Scale-free networks provide a unifying framework for the emergence of cooperation. Physical Review Letters, 2005. 95.

Follow

Get every new post delivered to your Inbox.

Join 2,318 other followers