# Evolving useful delusions to promote cooperation

This joint work with Marcel Montrey and Thomas Shultz combines — to be consistent with the interdisciplinary theme of this symposium — ideas from biology, economics, a little bit of cognitive science, and the approach is through applied mathematics. This post is a transcript of a presentation I gave on March 27th and covers part of my presentation today at Swarmfest.

Let’s start with the biology aspects of this. If we look at the popular view of evolutionary theory, as people usually associate with Darwin, his bulldog — Huxley, his rottweiler — Dawkins, and people in the gene-centered crowd:

• Genes are very self-interested or selfish,
• the organism reacts to the environment external to itself — there is some environment and fitness landscape that it explores,
• the organism or genes are trying to maximize their fecundity or number of offspring they will have in the next generation, and
• the whole process is kind of random, but selective over time.

This is the “Red in tooth and claw” view of nature. We have the same kind of view in economics of Homo economicus where the agent is again very self-interested, reacting to external events, and is trying to maximize their own payoff/utility, and — unlike the random case — is going through a rational process of decision making.

In biology, it turns out that this naive view is not very reasonable. This is associated with thinkers like Kropotkin, Hamilton, and Axelrod; and this idea of looking at cooperation — the fact that organisms are willing to give up some of their reproductive potential to aid other organisms in their environment. You see this through mechanisms like kin selection, the green-beard effect, and many other ones that I won’t mention. My favorite example of cooperation — instead of the usual bees and ants that everyone talks about — is the slime mold. It is usually a solitary predator, but when the environment becomes very austere, they get together, form a sheaf around themselves to climb away towards warmth and towards a better environment, and then form a stalk. Some of the microorganisms are at the top of the stalk, and some form the stalk itself; the ones at the top get to disperse and survive in a new environment, while the ones that make up the stalk perish. Thus, they display the most sincere kind of cooperation: they are completely sacrificing their life and all future offspring by helping their brethren.

Similarly, you see the same sort of irrationality and cooperation in humans. We are willing to engage in behavior for the aid of other people. We see this in ideas like costly punishment, where we will expend a cost to our own wealthy to punish a norm violator. Through ideas like fairness, and the transfer of social status to guide cooperation. Violations of things that we expect rational agents to follow, like the sure-thing principle: there are only two conditions possible A or B, if in condition A you would prefer to do activity X and in condition B you would also prefer to do X then it should not be the case that if it is indeterminate (if the case is A or B) that you will do condition Y. You should still do X, but humans don’t always behave this way. We follow mechanisms like quasi-magical thinking or self-deception. How can we address this?

One of the ways we want to address this is by looking at the subjective experience. This is not foreign to economists: what we are looking at is agents acting rationally, but on potential misrepresentations of the objective world around them. Each agent can hold different beliefs about the effects of their actions and the payoffs that they would receive. They act to maximize their own subjective utility, but that might not necessarily correspond to any objective measure of fitness.

How do we study this with a mathematical framework? We set up a competitive environment. The most popular approach is to look at it through the Prisoners’ dilemma game, which I will describe as the Knitter’s dilemma. Imagine that there are two agents, Alice and Bob and they both want to make a sweater. Alice only produces yarn, and Bob only produces needles, but they need both ingredients to manufacture a sweater. Since they can’t exchange in public, they decide to meet up at night under a bridge and trade briefcases with their respective goods. Now Alice is faced with a dilemma: should she include yarn in her briefcase or not? Let us look at the cases:

• If Bob includes needles, and Alice includes yarn then since she has extra yarn at home, she can come back and make a sweater but she will pay a small cost for giving away some of her yarn. Hence, she gets some benefit minus some cost.
• If she tricks Bob and brings an empty briefcase then she has all the benefit of having needles and going home to make a sweater, but none of the cost of giving away some of her yarn.
• Of course, if Bob doesn’t keep his end of the deal and brings an empty briefcase. If Alice brings yarn then she’ll be giving away some of her yarn for nothing and incur a small cost.
• If she also brings an empty briefcase then they just wasted some time at night trading briefcases.

You can see that the rational behavior in this case is to bring an empty briefcase. No matter what happens, you are better off to bring an empty briefcase. Bob follows the same reasoning and they arrive at the Nash equilibrium of this game: people going to meet under bridges and trade empty briefcases. However, from outside the game we can see that had they both done what they said they would and actually brought their goods then they would have both been better off; mutual cooperation Pareto dominates the Nash equilibrium. This is the classic idea of a social dilemma, you have to decide to do something that is irrational but will be better off for both of you if you decide to follow this route.

There are many mechanisms to achieve the Pareto optimum. If we meet over and over again then we can have some sort of reciprocity happening. If we repeat the encounter several times then I’ll start lying to you if you start lying to me. To avoid this work-around, you usually look at the dilemma in a one-time interaction: you are never going to meet this person again and there is not extraneous circumstances from that.

This is one of many games that can be played between two agents. We can consider the space of all two-player two-strategy games. In the figure, you will see in the red line the particular formulation of the Prisoners’ dilemma that we saw in terms of benefits and costs; but, in general, that whole upper left corner is the Prisoners’ dilemma — just different variants. However, you will see many other games by looking at the matrix, where:

• 1 — payoff for Alice if Bob cooperates and she cooperates,
• U — payoff for Alice if Bob defects and she cooperates,
• V — payoff for Alice if Bob cooperates and she defects, and
• 0 — payoff if they both defect.

This game space breaks up into four important regions. If $U < 0$ and $V > 1$, it is always rational (no matter what is happening) to defect; that is the only Nash equilibrium in the game, that is the Prisoners’ dilemma. If $U > 0$ and $V < 1$, it is always rational to cooperate, it is never rational to defect; you are always better off just cooperating. In the two regions in between, the rational behavior depends on the expected probability of the other agent cooperating with you. The decision depends on what you know about the world.

How do we model our agents? We want these agents to interact in some structured fashion, if they interact completely at random then we will always recover the Nash case. The least structure you can introduce while having non-trivial effects is a random 3-regular graph; consider all 3-regular graphs and sample one uniformly at random and that is our interaction model. The genotype of the agent is what they think the game is, similar to if I go to the gym it is encoded in my genes, to some extent, how much pain I will feel from exercising and how much dopamine will be released to counter that. Based on that, I will make decisions on if I want to go to the gym again or not. Their mind corresponds to two variables where they measure what they think is the probability someone will cooperate with them if they cooperate ($p$), and the probability someone will cooperate if they defect ($q$). Other agents actually can’t condition on their behavior, so the two values should always be equal but you do get effects like quasi-magical thinking where they actually believe them to be different even though they can’t possibly be different. They update their mind through Bayesian learning and they act subjectively rationally on their beliefs. Given that they think the game is represented by $U$ and $V$, if the expected payoff for cooperating ($p + (1 - p)U$) is higher than the expected payoff for defection ($V$) then they cooperate. that is the rationality part for them, based on their subjective representation of the game.

What happens? If the game is highly competitive ($c/b$ ratio near 1) then it is very easy to exploit the cooperators and recover the standard inviscid results. In the above figure, we have the density of how many agents evolve a certain brain, and we can see that the agents are concentrated mostly in the Prisoners’ dilemma section. They are all thinking that the game is a Prisoners’ dilemma game, and they don’t cooperate with each other as we can see from the red lines in the plot to the right.

However, if you make the competitiveness of the environment a little bit more mild then you will actually see a complete shift to the blue lines. All of the agents will evolve a representation of the world that is not consistent with the real interaction that is happening — a Prisoners’ dilemma, just a slightly less painful one — instead they will believe the game is one where the optimal strategy is to cooperate.

In a highly competitive environment, all of the agents will evolve towards an objectively rational strategy, there is a few that will evolve towards games where defection is one of the possible rational behaviors. This is why in some simulations you see a sort of phase transition, where most of them think it is a Hawk-Dove game and then someone starts to cooperate and they all follow. But in most worlds, they think it is a Prisoner’s dilemma game. In a mildly competitive environment — less competitive than the first case we saw — they will evolve representations where the only rational strategy is to cooperate. Thus, there will be mostly cooperative interactions happening. In fact, no agents evolve towards a Prisoners’ dilemma representation of the game.

What does this mean for us? There are two ways we can interpret these results. If we interpret these results from the point of view of “the experimenter knows best”, she designed the objective game that interactions happen in then the agents don’t evolve internal representations that match the real world. In this interpretation, misrepresenting the real world can be non-detrimental to the individual and actually benefits the individual and group.

The other interpretation, of course, is that “evolution knows best”. These interactions we design between the agents are not necessarily accurate depictions of the game that is happening because of the other externalities like the spatial factors. This means that if we always take an overly reductionist account, we say “oh, I know their pairwise interactions, I should be able to infer everything based on that” but that might not be everything that is happening. They are actually evolving towards a more holistic representation of what is happening and incorporating the externalities of their actions — things that are not direct effects on their fitness, but will trickle down to them in the long term.

From the ivory tower of the School of Computer Science and Department of Psychology at McGill University, I marvel at the world through algorithmic lenses. My specific interests are in quantum computing, evolutionary game theory, modern evolutionary synthesis, and theoretical cognitive science. Previously I was at the Institute for Quantum Computing and Department of Combinatorics & Optimization at the University of Waterloo and a visitor to the Centre for Quantum Technologies at the National University of Singapore.

### 11 Responses to Evolving useful delusions to promote cooperation

1. Terrific T says:

Somehow the term “leap of faith” comes to mind. Obviously not something that one would do if to be completely rational and selfish. But we do it anyways – perhaps an expected cultural norm? But then where did that come from?

I wonder whether you could model a system where the agents develop certain confidence/trust and then have that broken (the other person brings an empty suitcase), and then that trust recovers by certain % afterward, which will decide if the agent would choose to cooperate or deflect. Agents do still meet each other once, but this means previous experience might affect current decision making. What would that system look like?

(I remember one time a random guy on the street with a story about being locked out and no money to get home asked me for money. I gave him 20 bucks but never heard from him again. I kicked myself for being too nice. The next time I bumped into someone who actually needed help, I hesitated to give him money, and then again kicked myself because I think he would really need it.)

Anyways, might be an interesting thing to study. Did anyone do that already?

• Yeah, it is in that strange zone between fully iterated games (where you remember the actions of each person you interact with) and single-shot games where you can’t condition on the past at all. It is usually modelled as you have memory, but there is nothing you can use to tell agents apart (so you can’t remember something about a specific agent, only about agents in general). Take a look at Keven’s first post for an example. In that paper the neural nets can remember the past, but there is no marke by which to recognize specific agents. Maybe Keven will see this comment and join the discussion.