# Habitual selfish agents and rationality

Many game theoretical studies of cooperation have focused on the trade-off between selfish behavior that optimizes the individual, versus cooperative behavior that optimizes the whole. Because this trade-off is inherent to prisoner’s dilemma, this game has been a preferred method for studying mechanisms that encourage cooperation among selfish agents. Davies et al. [1] adopt a different framework, however: that of a coordination/anti-coordination game. Note that (anti-)coordination games do not differ from the general class of cooperate-defect games that Artem discussed previously, and would fall into the $X$ and $Y$ games he covered at the end of the post.

To get a flavor of Davies et al.’s game, imagine that there is a social network of scientists, each of which is either (symmetrically) compatible or incompatible with each of their peers. Compatibility results in good collaborations, while incompatibility leads to shoddy research that benefits no one. Collaborations occur every night in one of two bars. Initially, scientists pick a bar randomly. Every night after that, each scientist decides in random order whether to stick with the bar they visited the previous night, or whether to switch bars to meet a greater number of compatible researchers. Because payoffs are symmetric, a stable configuration is quickly reached where no one wants to switch bars. While this arrangement represents a local maximum, in that each scientist makes the selfishly rational choice to stay, it generally fails to maximize the group’s total productivity. Thus, much as in the prisoner’s dilemma, rational agents have no motivation to unilaterally switch bars in the hopes that others follow suit, even though such selfishness ultimately results in lower payoffs for the whole group.

Davies et al. investigate how to tweak each agent’s local strategy (i.e., no knowledge of/concern for the global utility) such that agents remain selfishly rational in some sense, yet still optimize the global utility function. Their answer comes in the form of habituation: agents develop a preference for whichever agents they are currently paired with. Agents, rather than basing their behavior on true utility (a function of the number of agents they are coordinated with), instead base their behavior on perceived utility (a sum of their true utility and their preference for the agents they are interacting with). Because this enables agents to occasionally sacrifice some of their true utility in order to pursue a higher perceived utility (i.e., to switch to a bar with an initially lower payoff), the population is ultimately able to reliably converge to the global maximum. In brief, habituation acts by reinforcing local maxima which, in cases where the global function is a superposition of numerous pairwise interactions, tend to overlap with the global maximum, making it a strong and reliably reachable attractor. (See slides by Peter Helfer, who presented this paper at the EGT Reading Group 38, for a nice overview.)

The model’s workings and results aside, there are interesting questions to ask as to what habituation represents in this context. Artem has previously posted about our interest in the dissociation between objective and subjective rationality. At the heart of this issue is the question of whether we can explain certain examples of seemingly irrational behavior within the framework of selfish rationality. To do so, we can appeal to the notion of bounded rationality: agents are not omniscient and must rely on imperfect information to make their decisions. If the imperfection lies in what payoffs the agents think they are getting (i.e., what game they think they are playing), then we may have a straightforward way of accounting for deviations from objective rationality: the behavior is a result of rationality being applied subjectively, by a deluded agent.

For a loose analogy, consider the case of the missionary. From the point of view of an atheist, a Christian travelling to Africa and doing volunteer work there seems highly altruistic, as he is incurring a high cost and deriving few tangible benefits. At first blush, such behavior seems difficult to explain on the basis of selfish rationality. But consider things from the missionary’s point of view. If the reward for selflessness is an eternity in paradise and the alternative is an eternity of damnation, then becoming a missionary yields the best possible payoff. In short, if the world (or game) is what the missionary believes it to be, then the behavior is selfishly rational after all, rather than purely altruistic.

Davies et al. make a similar point. They observe that, though habituation constitutes a deviation from objective rationality, these agents are still playing rationally—they just happen to be playing a different game from the real one. In other words, behaviors are still selected to maximize utility; it’s just that their perceived utility (as defined by the agent) deviates from their true utility (as defined by the game). To return to the notion of bounded rationality, agent “irrationality” here is explainable as a deficiency in the agent’s knowledge of the game, rather than in its decision-making capacities.

So how do we make sense of habituation? While Davies et al. justify it as a simple biasing mechanism worthy of investigation in adaptive networks, it is not altogether clear what this implies for psychology. On one hand, if such cognitive biases are broadly effective at optimizing cooperative behavior, then it is plausible that they evolved, or are commonly adopted through learning. On the other hand, it is difficult to judge whether such a bias is useful in cooperative behavior beyond the present context. Here, it solves the specific problem of the population’s search for an optimal arrangement stalling in local maxima. This is a crucial point, as the goal here (optimizing global payoffs in this coordination/anti-coordination game) is more a problem of constraint satisfaction than of replacing selfish behavior with cooperation. To sum up, we should not approach habituation and its consequences as if it solves the same problem that we face in the prisoner’s dilemma.

### References

1. Davies, A., Watson, R., Mills, R., Buckley, C., & Noble, J. (2011). “If You Can’t Be With the One You Love, Love the One You’re With”: How Individual Habituation of Agent Interactions Improves Global Utility Artificial Life, 17 (3), 167-181 DOI: 10.1162/artl_a_00030