Cooperation and the evolution of intelligence

One of the puzzles of evolutionary anthropology is to understand how our brains got to grow so big. At first sight, the question seems like a no brainer (pause for eye-roll): big brains make us smarter, more adaptable and thus result in an obvious increase in fitness, right? The problem is that brains need calories, and lots of them. Though it accounts for only 2% of your total weight, your brain will consume about 20-25% of your energy intake. Furthermore, the brain from behind its barrier doesn’t have access to the same energy resources as the rest of your body, which is part of the reason why you can’t safely starve yourself thin (if it ever crossed your mind).

So maintaining a big brain requires time and resources. For us, the trade-off is obvious, but if you’re interested in human evolutionary history, you must keep in mind that our ancestors did not have access to chain food stores or high fructose corn syrup, nor were they concerned with getting a college degree. They were dealing with a different set of trade-offs and this is what evolutionary anthropologists are after. What is it that our ancestors’ brains allowed them to do so well that warranted such unequal energy allocation?

EvoAnth’s Why Our Brains Are Big provides a good overview of the current hypotheses on the subject; the main one is the Social Brain Hypothesis (Dunbar, 1998) which states that as our distant ancestors became more social and as their social groups grew larger, there was increasing need for better memory, more effective decision-making and more comprehensive modes of representation (e.g. Theory of the Mind). These needs in turn exerted the evolutionary pressure for bigger, more complex brains. Support for this hypothesis comes mostly from correlational studies and no mechanistic description has so far been proposed.

neuralnetIn their paper, McNally, Brown & Jackson (2012) looked into whether a computational simulation of the evolutionary dynamics between agents of varying cognitive abilities could provide such a description. In their simulation, each agent is a neural network which consists of input and output nodes, weighted edges, and a hidden layer of cognitive and context nodes. The number of “neurons” in the hidden layer is used as a measure of intelligence. From one generation to the next, an offspring inherits its parent’s network structure as well as the weights. Those traits are however subject to a constant rate of mutation. In particular, the hidden layer can gain or lose nodes in its hidden layer (higher or lower intelligence).

During each generation, each agent will interact with every other agent in the population in instances of the Prisoner’s Dilemma game, or the Snowdrift game. The paper claims that they are iterated games, but given the description, this is clearly inaccurate. After each interaction, the agent’s network takes as inputs its own and its partner’s payoffs and computes the probability of cooperating during the next interaction. Agents never play against the same agent twice.

Computation in the neural network is accomplished when each node takes the weighted sum of its inputs. The result is passed in a sigmoid squashing function that will make determine whether the node will fire. The sigmoid function will push the final probability towards the extremes of 0 or 1, but there will be some noise. From this point of view, it seems randomness will be difficult to achieve, a fact that contrast some ideas about agency explored before on TheEGG blog.

Context nodes also play an interesting role: during a neural computation the context node first passes its value to its associated cognitive node which is integrated along with the other inputs, and then it is updated with the current value of the cognitive node. This mechanism is said to be similar to “emotional states” whereby “memory of past interactions” is accumulated without keeping a detailed sequence of events.

When all interactions have been played, agents produce an offspring with a probability proportional to their mean payoff minus a cost for intelligence. Taxing higher cognitive ability is important: without it, we would be studying a trade-off anymore. Artem used a similar implementation for cost on cognition in Kaznatcheev (2010). Finally, before the next cycle commences, parent networks die.

When I remarked that it was inaccurate to call these games ‘iterated’, I was not merely being picky about terminology: I think that the way the authors use these neural networks undermines the claims they hope to make about the evolution of intelligence. An intuitive understanding of the social intelligence hypothesis is that our greater cognitive abilities made it more likely for us to predict whether our partner will cooperate or defect and thus adjust our strategy accordingly in order to maximize our payoff. In this model however, it is clear that the decision to cooperate or defect has already been made when an agent meets its partner.

That being said, it is true that context nodes store information about all past interactions, hence effectively providing an educated guess about the whole population, and by the same token, the next player. Presented like that, the model already sounds more convincing and in accordance with this scheme, the proxy for intelligence should be more tightly linked with the number of context nodes only. Talking about these games as iterated, and comparing the emergent networks to well-known IPD strategies such as tit-for-tat or win-stay, lose-shift is misleading at best.

cyclesWhat this paper wanted to provide is evidence for an evolutionary history resembling a Machiavellian arms race where selection for efficient decision-making in cooperative games exerts a pressure for greater cognitive abilities (more complex strategies) which in turn select for greater intelligence. The strongest case for this story is given by the positive correlations between the frequency of contingent strategies and the selection for intelligence (measured here as the covariance between fitness and intelligence). However, this correlation is closely dependent on the level of cooperation which goes through a cycle: an important surge in cooperation levels brought about by high frequencies of contingent strategies, followed by invasion of simple, always-cooperate-like strategies, and a subsequent invasion by always-defect-like strategies. This oscillation pattern between cooperation and defection is well-known in the IPD literature, but it is interesting to see its emergence here from an initially random population of networks in a non-IPD setting.

I think the logical extension of this model would be to make agents more discerning, that is to allow the neural network to use information about its current partner (e.g. total payoff, last payoff, or average payoff) in order to make an informed decision, and after the interaction, allow it to update its values according to its payoff. While such conditional strategies have been studied (e.g. Szolnoki, Xie, Wang, Perc (2011); Szolnoki, Xie, Ye, Perc (2013)), applying the idea to neural networks would, to my knowledge, be novel.

In another possible extension, as the authors remark, agents could play not just one but many possible games. Indeed, perhaps what makes social interactions so complex is not only that different players may have different strategies, but also that we are constantly playing different games, sometimes simultaneously thus giving rise to increased needs for conflict-management abilities. Artem mentioned this idea as meta-game in an earlier post.

I’ll end this post on a few thoughts about deception, which is my primary line of research these days. Supposing the Social Brain Hypothesis is true, we can move on to the next question: Given that human’s social interactions are extremely varied and complex, which ones proved instrumental in this intelligence growth process? One idea is that higher intelligence allows more refined tactical deception and that in turn, even higher intelligence allows more efficient detection, therefore leading to an arms race (see this post for a model exploring this idea). While the idea is intuitive, there are inherent problems that come with studying deception because the concept itself is often ill-defined or loaded with assumptions. For example, McNally et al. (2012) claims their agents are doing something akin to deception, but for reasons I’ve mentioned here, it can’t be called that (whatever it is they’re doing). Nevertheless, I think the use of neural networks will prove useful in studying deception as it could give us a way to observe internal representations (albeit in a very simplified manner). I hope to write more on this topic, so stay tuned!


Dunbar, R. I. M. (1998) The social brain hypothesis. Evol. Anthropol. 6, 178–190. (doi:10.1002/(SICI)1520-6505(1998)6:5,178::AID-EVAN5.3.0.CO;2-8)

Kaznatcheev, Artem (2010). The cognitive cost of ethnocentrism. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd Annual Conference of the Cognitive Science Society. Austin, TX: Cognitive Science Society

McNally L., Brown S.P. & Jackson A.L. (2012). Cooperation and the evolution of intelligence, Proceedings of the Royal Society B: Biological Sciences, 279 (1740) 3027-3034. DOI:

Szolnoki, A., Xie, N.-G., Wang, C., and Perc, M. (2011) Imitating emotions instead of strategies in spatial games elevates social welfare. EPL, 96, 38002.

Szolnoki, A., Xie, N.-G., Ye,  Y., and Perc, M. (2013) Evolution of emotions on networks leads to the evolution of cooperation in social dilemmas. EPL, 96, 38002.


About Keven Poulin
I'm a cognitive science student with a math background. Like many these days, I'd like to know more about the way we think (or the way we think we think). Jumping from subjective and objective points of view, I look for a happy medium. My research now focuses on deception, with an eye on the bigger fish: self-deception.

15 Responses to Cooperation and the evolution of intelligence

  1. Unfortunately for the SBH, neanderthals had larger cranial capacities than modern humans by 200cc but much smaller social networks…

    • Keven Poulin says:

      Thanks for comment. That is a piece of information I was not aware of and which sent me on a little research. Very little actually since it turns out EvoAnth has blog post that addresses this very concern very well. In a nutshell, it’s thought that the Neanderthal brain had a bigger visual cortex (inferred from orbital cavity). Once they controlled for that, they found numbers that were in line with the SBH.

      Of course, I’m not trying to defend this hypothesis at all cost. But is there a competing hypothesis not related to social group size to explain humans’ and Neanderthals’ brain size ?

      • That study was actually published relatively recently, and it caused a stir on the internet of “Neanderthals went extinct because of good eyesight“. Unfortunately, controlling for everything that isn’t social cognition has a way of trivializing the SBH. “If we control for everything that isn’t related to social cognition then animals with more extensive social structure have larger brain areas associated with social cognition” just doesn’t sound that impressive.

        • Keven Poulin says:

          When you say it like that, I do begin to see the very slippery slope towards the tautology. I don’t know how strong an argument that is, but the problem I have with bringing Neanderthals into the picture is that we don’t really know what they did, do we? In particular, we don’t know if they did anything special, if they had some amazing skill that required a lot of brain power, but that paid off in the end. The SBH was based on data from primates which we can observe. These observations preceded and informed the candidate hypotheses.

  2. michael clarke says:

    If i understand the article correctly ( which the , McNally, Brown & Jackson (2012) study confused me a bit) is that as we became more social and larger populations intelligence became more important, im assuming for impressing mates. The Forebrain is where mental processes are taken and according to “Discovery Psychology: fifth edition” text book that portion of our brain represents 90% of it. Did this portion of our brain dramatically grow in size so we could better process how to socialize with potential mates?

    • Keven Poulin says:

      Yes, according to the hypothesis, our brains (more specifically, the neocortex) grew in size as our social groups grew in size as well. But it’s not so much about mating (though it can be part of it). It’s about being able to deal with all the social interactions that happen in a group: who are your friends? who are your enemies? who are your enemies’ friends? who is nice? who isn’t? etc. It’s also about the possibility of theory of mind and tactical deception: he thinks that I’m going to do A, so I’ll do B, etc. You can apply all this cognitive arsenal to find and keep mates, and you probably will, but it doesn’t need to be the main driving force, since a social group itself offers great advantages for survival: safety, work management, etc.

      In short, McNally et al. used simulated agents and had them play games. Some agents were “smarter” than others (they had more complex ways of deciding how to play, but that complexity came at a cost), and the authors looked at whether the trade-off was worthwhile.

      • I wouldn’t be so dismissive of the importance of mate selection in favor of general game dynamics. In particular, Dunbar is currently moving his research towards looking at the importance of pair-bonding to the SBH:

        Dunbar, R.I.M., & Shultz, S. (2007) Evolution in the Social Brain. Science 317.

        Palchykov, V., Kaski, K., Kertész, J., Barabási, A. L., & Dunbar, R. I. (2012). Sex differences in intimate relationships. Scientific Reports, 2.

        And he is not the only one to suggest that sexual selection is important for brain size:

        Schillaci, M.A., 2006. Sexual Selection and the Evolution of Brain Size in Primates. PLoS ONE 1(1): e62.

        Of course, this doesn’t mean that we should abandon a game theoretic analysis, since EGT can also provide insights in understanding the transition from promiscuity to pair-bonding:

        Gavrilets, S. (2012). Human origins and the transition from promiscuity to pair-bonding. Proceedings of the National Academy of Sciences, 109(25), 9923-9928.

        Of course, all of this isn’t quite as simple as Michael Clarke suggests with “how to socialize with potential mates”, but apparently (to bring back our original theme) detecting deception might be more important in pair-bonded than promiscuous societies. Full circle?

  3. Thank you for the awesome post Keven! I hope to see many more like it from you :D.

    I am always a little bit skeptical of models that use neural networks but don’t do a very careful job of analyzing and explaining what the neural network does and how. These sort of models often tend to give themselves a false allure of ‘realism’ while hiding behind a universal learning algorithm that they don’t understand well at all — the most dangerous curse of computing. Of course, this would be fine if the models were actual insilications and the neural networks accurately (in a quantifiably testable way that is consistent with some underlying reductionist theory as opposed to fitted to data) captured reality, but these neural net models were usually built by computer scientists with only a vague notion of neurobiology and often don’t reflect dynamics a neuroscientists would find convincing.

    I think the distinction between the iterated PD where you condition on each individual and one where you condition only on statistical properties of all agents is an interesting one to think about more deeply. I brushed it off too quickly during our meetings. In particular, since the populations still remain relatively homogeneous, it should be possible to infer useful information. Also, since your fitness is based on your average performance, it would not be surprising if part of conditioning on individuals can come out in the wash.

    In general, when theoretical computer scientists look at just learning (or follow Valiant in thinking of evolution as a subclass of machine learning) then the distinction between reasong about each individual agent (or, each learning example) and statistical properties of the ensemble is studied as as the difference between full PAC-learning and Kearns’ (1998) statistical queries (SQ) model. A lot of surprising things are learnable in SQ, and Valiant’s evolvability model is a subset of SQ (Feldman, 2009), so it could be worthwhile to think about this difference. Of course, these are just vague connections.

    Kearns, M.J. (1998) “Efficient noise-tolerant learning from statistical queries”, Journal of the ACM 45(6): 983-1006.

    Feldman, V. (2009) “A complete characterization of statistical query learning with applications to evolvability”, 50th Annual IEEE Symposium on Foundations of Computer Science: 375-384.

    • Keven Poulin says:

      I share your concerns about the liberal use of neural networks to make inferences about the activity of actual networks of neurons in the brain. But in this article, the claims were not so much about what goes on in the brain but rather about behavioural contingency. As the authors remark, “intelligence” in this context says nothing about the efficiency of a strategy, but just how complex it is. Bottom line is that it’s still a metaphor… I think metaphors are cursed too!

      I agree that deciding on the next move based on stats about the whole population can be very useful. And now that I think of it, even if you condition on a certain aspect of an agent, you are only narrowing down on what part of your history is relevant to understanding the next move. In this simulation, they just happen to see all agents as being the same. While my criticism may have been a little rushed, they still should have addressed this more transparently.

      • The most serious studies I know that get interesting dynamics out of carefully looking at statistical properties of agent behavior is the minority game (or El Farol Bar problem) that is popular in econophysics. This simple game has a surprisingly rich literature and relates closely to the cost of agency (except you have to make k negative in that post, to simulate it being difficult to randomize strategy). The guys have a history of looking at how models relate to ‘intelligence’ and that “large brains always take advantage of small brains” and it might relate to deception:

        Challet, D., & Zhang, Y. C. (1998). On the minority game: Analytical and numerical studies. Physica A: Statistical Mechanics and its Applications, 256(3), 514-532.

        Mitman, K. E., Choe, S. C., & Johnson, N. F. (2005). Competitive advantage for multiple-memory strategies in an artificial market. In SPIE Third International Symposium on Fluctuations and Noise (pp. 225-232). International Society for Optics and Photonics.

        Mello, B. A., & Cajueiro, D. O. (2008). Minority games, diversity, cooperativity and the concept of intelligence. Physica A: Statistical Mechanics and its Applications, 387(2), 557-566.

        Ironically (?) enough, this was a game where I was very willing to use neural net models when Peter and I looked at it briefly some time ago. Except we were trying to be more careful by adapting a neural model analysis as pedantic and careful as Beer (2003).Unfortunately, that project has been put in the rotating drawer, but I am sure I will return to it since the minority game has been bothering me since 2009! Ahh! I wonder if the minority game can be adapted to a setting where we can study the distinction between conditioning on individuals and conditioning on statistics.

        Beer, R.D. (2003), The Dynamics of Active Categorical Perception in an Evolved Model Agent, Adaptive Behavior 11(4): 209-243.

    • >I am always a little bit skeptical of models that use neural networks but don’t do a very careful job of analyzing and explaining what the neural network does and how.

      It depends on what you’re doing with the ANNs. I’d agree with the sentiment that it’s silly to evolve an ANN an make inferences about actual brains from the structure of that ANN. That’s my biggest gripe with the paper this review was written about: They use # of hidden neurons as a proxy for intelligence! Anyone who knows ANNs knows that that is a horrible proxy for intelligence, or anything really…

  4. Pingback: Replicator dynamics of cooperation and deception | Theory, Evolution, and Games Group

  5. Pingback: Hunger Games themed semi-iterated prisoner’s dilemma tournament | Theory, Evolution, and Games Group

  6. Pingback: Stats 101: an update on readership | Theory, Evolution, and Games Group

  7. Pingback: Cataloging a year of blogging: from behavior to society and mind | Theory, Evolution, and Games Group

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: