Evolution of complexity

My cats are jittery. The end of the world must be near.

In this post I will outline the questions I wish to answer, the existing approaches to these questions and the intuitions I wish to formalize. Thus I’m going to break all the promises I made in the last post — but really, this should have been the last post, somehow I got sidetracked…

There is an intuition that life has gotten more complex over geological time. More hierarchies have been built, more specialization of parts, possibly even more intelligent. This intuition is not only true for life, but for societies. Governments, bureaucracy, culture, social divisions — all seem to become more complex with time. The modern division of labor is the pinnacle of this. Why?

Orthodox evolutionary theory does not have an answer to this. And well it does not, because the answer is both nasty — and, I will argue, deathly wrong. Orthodox evolutionary theory proposes that natural selection is the major driver of evolutionary change, that organisms change their forms and types because of differences in fitness. Ergo, things must have become more complex, specialized, etc. — because they are more fit. Humans, as such recent players in evolutionary history, are very complex, very specialized, and very smart — hence we are the evolutionary climax, the last of the rungs of the evolutionary ladder to perfection…

This answer is the disowned natural bastard of orthodox evolutionary theory. It is never given out in modern polite scientific discourse, and most biologists would argue militantly that it is not true. Like all disowned bastards of royalty, however, its head constantly pokes up with alarming frequency, each time with new insurgent allies looking for a coup d’etat: eugenics, racism, social darwinism, all the bad movies using evolution as deus -ex-machina (X-men!) etc. Hear hear, it seems to say, Evolution is making us better, and we should hasten her along.

I’m fully convinced that this bastard child is incredibly wrong and we ought to have its head on a pike. The trouble is, parts of its parents might have to go also. The parts that lead to this bastard child, at any rate — I will argue that the long trends of macroevolution have little to do with natural selection.

For simplicity, let’s focus on complexity alone for the moment. Current (polite, modern, scientific) discussion about the evolution of complexity hinge around if it is increasing at all — it turns out “complexity” is an awfully slippery thing to measure. McShea did a great deal of work (1991, 1996, 2005) here, although him and colleagues aren’t the only group (e.g. Heylighen 1999). In any case, there’s a vast literature on this. I would carefully submit, however, that complexity has in fact increased, since we don’t find any pre-cellular living forms (the earliest living form cannot have been a fully formed bacteria!) and even our oldest Archeabacteria are well removed from the earliest living form (say, a self-replicating RNA). Nor did anything as complex as mammals show up right at the beginning of life. Although this work on the definition and measurement of complexity is incredibly valuable, we don’t need a thermometer to tell us that boiling water is hotter than ice.

The question is therefore why. To use the answer of orthodox evolutionary theory, that natural selection drove the extinction of simple organisms and made organisms more and more complex, is intensely unsavory. It’s more than just the political and cultural distastefulness of the answer and the capacity for people to abuse this fact, I — and many others — do think it’s actually wrong. But then there must be another force outside of natural selection that can drive evolutionary trends on the geological scale. What is this other force? I’ll summarize the existing arguments — and my opinions (why else write a blog?) — below:

Gould (Gould 1997) has famously argued that complexity, on average, has nothing to do with natural selection. The increase in complexity over time, he says, is simply the result of a random walk. Sometimes complexity is good for the organism and it grows more complex. Sometimes complexity is less good and organisms grow less complex. However, for biochemical reasons, the first life forms had to be very simple. On the other hand, complexity cannot go below zero. Thus, the evolution of complexity is that of a random walk with an absorbing state of zero. The average complexity then naturally increases, but complexity itself, per se, doesn’t do anything for fitness at all — what is good for fitness is entirely environmentally dependent.

Gould thus posits no positive force for the increase in complexity. McShea, who concluded that complexity was increasing after all, considers drift to be a positive, but weak, force for the increase of evolution. With Robert Brandon, they coauthored a book arguing that this is the case. A positive review is here. The idea is that mutation is a natural driver for diversity, which is synonomous with complexity. After all, one does not mutate into the same thing that one was. Thus, in the complete absence of selection, evolution progresses into greater complexity. McShea and Brandon called this the Zero Force Evolutionary Law (hmmm… I pick up a hint of Asimov here).

I think Gould’s idea is very clever, but it contradicts empirical evidence. There are traits that seem to make its carrier fitter over a wide swath of environments; the eusociality of ants must have contributed to their dominance in the world’s ecology. Might complexity be such a trait? Probably not, considering the dominance of bacteria… but we cannot reject that complexity has an effect on fitness overall, as Gould does. Besides, according to Gould, if all of complexity is driven by environmental conditions, then all lineages should show theoretically unlimited movement in complexity. Thus, according to this theory, whenever environments favor the loss of mitochondria in eukaryotes, eukaryotes should lose them and good riddance. Unfortunately, for all strains of eukaryotes, losing mitochondria is death — regardless of the environment, so there is no strain of prokaryotes with eukaryotes as their ancestor. This is the intuition I hope to tighten later. Similarly, of the many billions of cases of cancer that has occurred throughout evolutionary history, there is no species of single-celled organism that had a amphibian, reptile, or mammalian ancestor (I’m not sure about sponges). Once the organism dies, the cancer dies (unless it’s kept alive in a lab). None of the cancer cells could revert to unicellularism, although it’s undoubtedly advantageous for them to do so. Although multicellular to unicellular evolution has certainly occurred, they don’t seem to have ever occurred in species where the viability of the organism depends on an intense and obligate integration evolved over billions of years. Thus, unlike Gould’s claim, there seem to be some plateaus of complexity that, once stepped onto, cannot be descended from.

McShea’s idea, on the other hand, I’m skeptical about. It reminds me of a flavor of mutationism and orthogenesis that has been soundly routed in the course of the history of evolutionary thought, with good reason. Natural selection is an awfully powerful force, strong enough to beat the second law of thermodynamics every time. Most mutations increase diversity, yes, but most mutations also make us closer to a ball of gas, and yet we aren’t balls of gas. The authors seem to believe that it is a gentle breeze of a force blowing in the background, such that if the selection for or against complexity averaged out to nearly zero over time, then this force is sufficient to provide a long term trend. With no mathematical model behind their reasoning, I cannot formed an informed opinion of whether this might be true — but I highly doubt it. What I think is much more likely to happen is that the evolution of complexity would be precisely as dictated by natural selection, whether it is a random walk or not, and only increased slightly at all points in time by the gentle breeze. Consider the following 2 minute drawing in GIMP:

Different prediction for the evolution of complexity

It looks terrible, I know — and the axes are unlabeled. Okay okay, x axis is time and y axis is some measure of complexity. Let’s say that natural selection is the black line, so sometimes complexity is selected up and sometimes down, but there’s no overall trend (well, there should be, since it’s a random walk with an absorbing condition — but bear with me here). Red is what I think McShea and Brandon is proposing, that there’s a gentle background force moving complexity up. But I think that the gray line is what would happen — the evolution of complexity, according to McShea and Brandon’s force, would exactly reflect natural selection, with no trend. The force would nudge complexity to be a bit higher than what it would otherwise be, but that’s it.

Wow this post has gotten long already — you can’t say very much in a post! Next post — and I won’t break any promises this time — will deal with my own thoughts on increasing complexity and its links to the holey landscapes model.

Evolution of ethnocentrism with probabilistic strategies

This post shows some preliminary results for probabilistic strategies in Hammond and Axelrod [HA06a, HA06b] style simulations. I wrote the code and collected the data in May of 2008 while working in Tom’s lab. The core of the simulations is the same as the original H&A simulations. They key difference is that instead of having in-group (igs) or out-group (ogs) strategies that are either 0 (defect) or 1 (cooperate), we now allow probabilistic values p in the range of 0 to 1 where an agent cooperated with probability p. Also, we look at both Prisoner’s dilemma (PD) and the Hawk-Dove (HD) game.

Parameter information

The parameters that were constant across simulations, are listed in the table below:

default ptr 0.12
death rate 0.10
mutation rate 0.005
immigration rate 1 per epoch
number of tags 4

For the game matrix, we considered the standard R,S,T,P parametrization. Where R is the reward for mutual cooperation, S is the suckers payoff for cooperating with a defector, T is the temptation to defect for defection against a cooperator, and P is the punishment for mutual defection. For the HD game we had (R,S,T,P) = (0.02, 0.00, 0.03, x) and for PD we had (0.02, x, 0.03, 0.00) where x is the value listed in the parameters section in the table of results.

A further stressor was added to some of the PD simulations, and is listed in the comments section. Here we varied the number of tags as the simulation went on. A comment of the form ‘XT(+/-)YT(at/per)Zep’ means that we start the simulation with X tags, and increase (+) or decrease (-) the number of tags available to new immigrants by Y tags at (at) the Z-th epoch or at every multiple (per) of Z epochs. This was to study how sensitive our results were to stress induced by new tags.


The results for each case are averages from 10 worlds. The videos show 2 bar graphs the left one shows the number of agents with an igs strategy in a given range, and the right does the same for ogs strategy. Note that the y-axis varies between videos, so be careful! The red error bars are standard error from averaging the 10 worlds. The horizontal green line corresponds to 1/10th of the total world population, and the dotted green lines are the error bars for the green line. This line is present so that it is easy to notice world saturation. The videos start at epoch 1 and goes through all the epochs in the simulations (the current epoch number is tracked by the number ontop). Each frame shows the strategy distributions from that epoch.

Game Parameters Comments Link
Hawk-Dove -0.01 video
-0.02 video
-0.04 video
Prisoner’s dilemma -0.01 video
-0.01 2T+1Tper100ep video
-0.01 4T+1Tat350ep video
-0.01 4T+4Tat350ep video
-0.01 4T-2Tat350ep video
-0.02 video
-0.04 video


For me, the biggest conclusion of this study was that there was nothing inherently interesting in modifying the H&A model with probabilistic tags. The popular wisdom is that evolutionary models that are inspired in some way by replicator dynamics (this one would be, with extra spatial constraints, and finite population) do not benefit from adding randomized strategies directly. Instead we can let the probabilistic population distribution of deterministic agents allow for the ‘randomness’. However, we will see next week that this popular wisdom is not always the case: it is possible to construct potentially interesting models based around the issue of randomized vs. deterministic strategies.

I do enjoy this way of visualizing results, even-though it is completely unwieldy for print. It confirms previous results [SHH08] on the early competition between ethnocentric and humanitarian agents, by showing that the out-group strategy really doesn’t matter until after word saturation [SHK09] (around epoch 350 in the videos) since it remains uniformly distributed until then. The extra stress conditions of increasing or decreasing the number of tags were a new feature and I am not clear on how it can be used to gain further insights. Similar ideas can be used to study founder effect, but apart from that I would be interested in hearing your ideas on how such stresses can provide us with new insights. The feature I was most excited about when I did these experiments was addressing both the Hawk-Dove and the Prisoner’s Dilemma. However, since then I have conducted much more systematic examinations of this in the standard H&A model [K10].


[HA06a] Hammond, R.A. & Axelrod, R. (2006) “Evolution of contingent altruism when cooperation is expensive,” Theoretical Population Biology 69:3, 333-338

[HA06b] Hammond, R.A. & Axelrod, R. (2006) “The evolution of ethnocentrism,” Journal of Conflict Resolution 50, 926-936.

[K10] Kaznatcheev, A. (2010) “Robustness of ethnocentrism to changes in inter-personal interactions,” Complex Adaptive Systems – AAAI Fall Symposium

[SHH08] Shultz, T.R., Hartshorn, M. & Hammond R.A. (2008) “Stages in the evolution of ethnocentrism,” Proceedings of the 30th annual conference of the cognitive science society.

[SHK09] Shultz, T.R., Hartshorn, M. & Kaznatcheev, A. (2009) “Why is ethnocentrism more common than humanitarianism?” Proceedings of the 31st annual conference of the cognitive science society.

The evolution of compassion by Robert Wright at TED

An enjoyable video from Robert Wright about the evolution of compassion:

How would you model the evolution of compassion? How would your model differ from standard models of evolution of cooperation? Does a model of compassion necessarily need the agents to have model minds/emotions to feel compassion or can we address it purely operationally like cooperation?

Evolving viable systems

It got cold really, really quickly today… Winter is coming  –GRRM

In my last two posts I wrote about holey adaptive landscapes and some criticisms I had for the existing model. In this post, I will motivate the hijacking of the model for my own purposes, for macroevolution :-) That’s the great thing about models, by relabeling the ingredients something else, as long as the relationship between elements of the model holds true, we may suddenly have an entirley fresh insight about the world!

As we might recall, the holey adaptive landscapes model proposes that phenotypes are points in n-dimensional space, where n is a very large number. If each phenotype was  a node, then mutations that change one phenotype to another can be considered edges. In this space for large enough n, even nodes with very rare properties can percolate and form a giant connected cluster. Gavrilets, one of the original authors of the model and its main proponent, considered this rare property to be high fitness. This way, evolution never has to cross fitness valleys. However, I find this unlikely; high fitness within a particular environment is a very rare property. If the property is sufficiently rare, then even if the nodes form a giant connected cluster, if the connections between nodes are sufficiently tenuous, then there is not enough time and population size for their exploration.

Think of a blind fruitfly banging its head within a sphere. On the wall of the sphere is pricked a single tiny hole just large enough for the fruitfly to escape. How much time will it take for the fruitfly to escape? The question clearly depends on the size of the sphere. In n-dimensions, where n is large, the sphere is awfully big. Now consider the sphere to be a single highly fit phenotype, and a hole is an edge to another highly fit phenotype. The existence of the hole is not sufficient to guarantee that the fruitfly will find the exit in finite time. In fact, even a giant pack of fruitflies — all the fruitflies that ever existed — may not be able to find it, given all the time that life has evolved on Earth. That’s how incredibly large the sphere is — the exit must not only exist, it must be sufficiently common.

The goal of this post is to detail why I’m so interested in the holey adaptive landscapes model. I’m interested in its property of historicity, the capacity to be contingent and irreversible. I will define these terms more carefully later. Gavrilets has noted this in his book, but I can find no insightful exploration of this potential. I hope this model can formalize some intuitions gained in complex adaptive systems, particularly those of evo-devo and my personal favorite pseudo-philosophical theory, generative entrenchment (also here and here). Gould had a great instinct for this when he argued for the historical contingency of evolutionary process (consider his work on gastropods, for example, or his long rants on historicity in his magnus opus — although I despair to pick out a specific section).

Before I go on, I must also rant. Gould’s contingency means that evolution is sensitive to initial conditions, yes, this does not mean history is chaotic, in the mathematical sense of chaos. Chaos is not the only way for a system to be sensitive to initial conditions, in fact, mathematical chaos is preeminently ahistorical — just like equilibriating systems, chaotic systems forget history in a hurry, in total contrast to what Gould meant, which is that history should leave an indelible imprint on all of future. No matter what the initial condition, chaotic systems settle in the same strange attractor, the same distribution over phase space. The exact trajectory depends on initial condition, yes, but because the smallest possible difference in initial condition quickly translates to a completely different trajectory, it means that no matter how you begin, you future is… chaotic. Consider two trajectories that began with very differently, the future difference between those two trajectories is no greater than two trajectories that began with the slightest possible difference. The difference in the difference of initial conditions is quickly obliviated by time. Whatever the atheist humanists say, chaos gives no more hope to free will than quantum mechanics. Lots of people seem to have gone down this hopeless route, not least of which is Michael Shermer, who, among many different places, writes here:

And as chaos and complexity theory have shown, small changes early in a historical sequence can trigger enormous changes later… the question is: what type of change will be triggered by human actions, and in what direction will it go?

If he’s drawing this conclusion from chaos theory, then the answer to his question is… we don’t know, we can’t possibly have an idea, and it doesn’t matter what we do, since all trajectories are statistically the same. If he’s drawing this conclusion from “complexity theory” — not yet a single theory with core results of any sort, then it’s a theory entirely unknown to me.

No, interesting historical contingency is quite different, we will see if the holey landscapes model can more accurately capture its essence.

Things in the holey landscapes model get generally better if we consider the rare property to be viability, instead of high fitness. In fact, Gavrilets mixes use of “viable” and highly fit, although I suspect him to always mean the latter. By viable, I mean that the phenotype is capable of reproduction in some environment, but I don’t care how well it reproduces. For ease of discussion, let’s say that viable phenotypes also reproduce above the error threshold, and there exist an environment where it is able to reproduce with absolute fitness >1. Else, it’s doomed to extinction in all environments, and then it’s not very viable, is it?

It turns out that the resultant model contains an interesting form of irreversibility. I will give the flavor here, while spending the next post being more technical. Consider our poor blind fruitfly, banging its head against the sphere. Because we consider viability instead of “high fitness”, there are now lots of potential holes in the sphere. Each potential hole is a neighboring viable phenotype, but the hole is opened or closed by the environment, which dictates whether that neighboring viable phenotype is fit.

Aha, an astute reader might say, but this is no better than Gavrilets’ basic model. The number of open holes at any point must be very small, since it’s also subject to the double filter of viability and high fitness. How can we find the open holes?

The difference is that after an environmental change, the sphere we are currently in might be very unfit. Thus, the second filter — high fitness — is much less constrictive, since it merely has to be fitter than the current sphere, which might be on the verge of extinction. A large porportion of viable phenotypes may be “open” holes, as opposed to the basic model, where only the highly fit phenotypes are open. Among viable phenotypes, highly fit ones may be rare, but those that are somewhat more fit than an exceedingly unfit phenotype may be much more common — and it’s only a matter of time, often fairly short time on the geological scale, before any phenotype is rendered exceedingly unfit. So you see, in this model evolution also did not have to cross a fitness valley, but I’m using a much more classical mechanism — peak shifts due to environmental change, rather than a percolation cluster of highly-fit phenotypes.

Now that our happier fruitfly is in the neighboring sphere, what is the chance that it will return to its previous sphere, as opposed to choosing some other neighbor? The answer is… very low. The probability of finding any particular hole has not improved, and verges on impossibility; although the probability of finding some hole is much better. Moreover, the particular hole that the fruitfly went through dictate what holes it will have access to next — and if it can’t return to its previous sphere, then it can’t go back to remake the choice. This gives the possibility of much more interesting contingency than mere chaos.

This mechanism has much in common with Muller’s Ratchet, or Dollo’s Law, and is an attempt to generalize the ratchet mechanism while formalizing what we mean, exactly, by irreversibility. I will tighten the argument next Thursday.

Replicator dynamics of perception and deception

In this note we study the replicator dynamics of perception and deception. In particular, we consider a variant of the prisoners dilemma where we have 4 basic strategies, cooperators C, defectors D, perceivers CI, and deceivers DI.

Model 1

Cooperators pay a cost c to provide a benefit b to any other agent. Defectors never pay a cost, but receive benefit from those that donate. Perceivers can detect defectors and thus cooperate only with cooperators, perceivers, and the tricky deceivers. For this perceptive ability, they past an extra cost k. Deceivers are like defectors, except they are capable of tricking even perceivers into cooperating with them. This deception carries a cost l. Letting p be the proportion of cooperators, q for perceivers, and r for defectors (thus the proportion of deceivers is 1 - p - q - r), we can write down the utility
functions for the 4 strategies:

\begin{aligned}  U(C) & = b(p + q) - c \\  U(CI) & = b(p + q) - c(1 - r) - k \\  U(D) & = bp \\  U(DI) & = b(p + q) - l  \end{aligned}

Note that as long as k,l > 0, the only pure evolutionary stable strategy is to defect. However, in the case of 0 \leq c \leq l \leq k + c, the invasion graph is non-trivial. In particular, a world of

  • CI can be invaded by DI and C, but not by D.
  • DI can be invaded by C or D.
  • C can be invaded by D.
  • D is ESS

This suggests that there is potential for an internal fixed point. Let us equate the utilities to find the fixed point:

  • Setting U(C) = U(CI) we get r = k/c
  • Setting U(C) = U(D) we get q = c/b
  • Setting U(C) = U(DI) we get c = l

Unfortunately, c = l means that this fixed-point is a knife-edge phenomena in the larger parameter space of (b,c,k,l). Thus, let us consider the parameters 0 \leq k \leq c = l \leq b (the last inequality comes from the definition of PD). Thus, the dynamics keep a whole line fixed, parametrizing by p, we get the following stable distribution of strategies:

(p \;, \; k/c \;, \; c/b\; ,\; 1 - k/c - c/b - p)

Note that this places further constraints on our parameters, since we need k/c + c/b \leq 1 or $kb + c^2 \leq bc$. Thus, the parameter range is reduced to 0 \leq k \leq c(1 - \frac{c}{b}) and 0 < c = l. Note that in the stable distribution the highest social welfare is when there is no DI agents, and it is reasonable to believe that the population would converge to this point on the C-CI-D face of the strategy simplex. Let us compute the stability criterion:

\begin{aligned} s \cdot P s^* & = \frac{1}{bc}\begin{pmatrix} x & y & 1 - x - y \end{pmatrix} \begin{pmatrix} b - c & b - c & - c \\ b - c - k & b - c - k & - k \\ b & 0 & 0 \end{pmatrix} \begin{pmatrix} bc - bk - c^2 \\ bk \\ c^2 \end{pmatrix} \\ & = \frac{1}{bc}\begin{pmatrix} x & y & 1 - x - y \end{pmatrix} \begin{pmatrix} (b - c)(bc - c^2) - c^3 \\ (b - c - k)(bc - c^2) - c^2k \\ b(bc - bk - c^2) \end{pmatrix} \\ & = \frac{b(cb - c^2) - x(bc^2 - b^2k) - y(c(bc - c^2) + bck - b^2k)}{bc} \\ & = b - c/b - x(c - bk/c) - y(c(1 - c/b) + k(1 - b/c)) \\ \end{aligned}

This shows the stable point is not ESS.

Model 2

From what we learned in model 1, we can define an alternative 3 strategy model which has even more interesting dynamics. Let cooperators pay a cost c_1 to give a benefit b to cooperators, perceivers, and defectors. Let perceivers pay a cost c_2 > c_1 to give a benefit b to cooperators, and perceivers, but not to defectors. Lastly, let defectors pay no cost to give no benefit. Thus, the extra cost perceivers must pay in order to distinguish agents that cooperate from those that don’t is captured in the higher cost of cooperation. For now, we will not worry about deceivers, and write down the utility functions with p as the proportion of cooperators, and q the proportion of perceivers.

\begin{aligned}  U(C) & = b(p + q) - c_1 \\  U(CI) & = (b - c_2)(p + q) \\  U(D) &= bp  \end{aligned}

Note that this game has no pure equilibria, and an interesting, rock-paper-scissors like invasion graph. In particular, a world of

  • CI can be invaded by C but not by D,
  • C can be invaded by D but not by CI, and
  • D can be invaded by CI but not by C.

From this invasion graph we can see that there must be an internal fixed point.

  • Setting U(C) = U(CI) we get p + q = c_1/c_2.
  • Setting U(C) = U(D) we get q = c_1/b

Thus, our fixed point is given by:

(c_1/c_2 - c_1/b\; ,\; c_1/b\; ,\; 1 - c_1/c_2)

The dynamics are then orbits around this fixed point. A nice choice of parameters is mutliplies of (b,c_1,c_2) = (6,2,3).

It is possible to add DI agents to this model, and we have two choices on how to penalize them. If we want DI to be invaded by D then we should penalize DI by a constant offset, like in Model 1. If we want to have D to DI drift then we should use ideas similar to how CI agents are different in Model 2. This means that we can have C, CI, and D agents receiving benefit b_1 while DI agents are penalized by receiving a benefit b_2 < b_1. I would lean towards the first case, since it is more general.

Evolving past Bruce Bueno de Mesquita’s predictions at TED

Originally, today’s post was going to be about “The evolution of compassion” by Robert Wright, but a September 3rd Economist article caught my attention. So we will save compassion for another week, and instead quickly talk about prediction human behavior. The Economist discusses several academics and firms that specialize in using game theory to predicting negotiation behavior and quantify how it can be influenced. The article included a companion video highlighting Bruce Bueno de Mesquita, but I decided to include an older TED talk “Bruce Bueno de Mesquita predicts Iran’s future” instead:

I like the discussion of game theoretic predictions in the first part of the video. I want to concentrate on that part and side-step the specific application to Iranian politics at the end of the video.

Bruce Bueno de Mesquita clearly comes from a political science background, and unfortunately concentrates on very old game theory. However, we know from EGT that many of these classical assumptions are unjustified. In particular, Bueno de Mesquita says there are only two exceptions to rationality: 2-year olds and schizophrenics. Of course, this means we have to ignore classical results such as those of Shafir & Tversky [ST92,TS92] and basically the whole field of neuroeconomics.

The speaker also tries to build a case for modeling by scaring us with factorials and prescribing magic power to computers. He gives the examples of being able to keep track of all possible interactions of 5 people in your head, but not of 10. However, as we know from basic complexity theory, working with problems that grow in difficulty as factorials is not possible for computers either. In particular, if Bueno de Mesquita simple argument held, then for 20 people, all the computing power on Earth would not be enough to run his simulations. Thus, the real reason behind the need for computational modeling (or game theory software as the Economist article calls it) is not one of simply considering all interactions. You still need great ideas and beautiful models to cut down the set of possible interactions to ones that can be tractably analyzed.

Of course, the way we actually overcome this ridiculous explosion in complexity is by using problem-specific knowledge to constrain our possible influences and interactions. My favorite graphic of the talk is the influence graph of the president of the United States. Not because it is a new idea, but because understanding the function and design of such networks is central to modern EGT. A classic example is the work on selection amplifiers [LHN05] which showed the weaknesses of hierarchical structures such as then president’s influence network for promoting good ideas.

Although Bueno de Mesquita accuracy of predictions is impressive (although the 90% he cites is also misleading; note that simply predicting the opposite of the expert opinion would yield similar results), his methods are outdated. If we want to take game theoretic prediction to the next step, we must consider realistic bounds on the rationality of agents, reasonably simple feedback and update rules, and computationally tractable equilibria concepts. All of these are more likely to come from work on questions like the evolution of cooperation than think-tanks bigger and bigger ‘game theory software’.

I tried to keep my comments brief, so that you can enjoy your weekend and the video. Please leave your thoughts and analysis of the talk and article in the comments. Do you think evolutionary game theory can improve the predictive power of these classic models? Why or why not?


[LHN05] Lieberman, E., Hauert, C., and Nowak, M.A. (2005). Evolutionary dynamics on graphs. Nature, 433, 312-316.
[ST92] Sha fir, E., & Tversky, A. (1992). Thinking through uncertainty: Nonconsequential reasoning and choice. Cognitive Psychology, 24, 449-474.
[TS92] Tversky, A., & Shafi r, E. (1992). The disjunction eff ect in choice under uncertainty. Psychological Science, 3 , 305-309.

Criticisms of holey adaptive landscapes

:-) My cats say hello.

In my last post I wrote about holey adaptive landscapes, a model of evolution in very high dimensional space where there is no need to jump across fitness valleys. The idea is that if we consider the phenotype of organisms to be points in  n-dimensional space, where n is some large number (say, tens of thousands, as in the number of our genes), then high fitness phenotypes easily percolate even if they are rare. By percolate, I mean that most high-fitness phenotypes are connected in one giant component, so evolution from one high fitness “peak” to another does not involve crossing a valley, rather, there are only high fitness ridges that are well-connected. This is why the model is consider “holey”, highly fit phenotypes are entirely connected within the fitness landscape, but they run around large “holes” of poorly fit phenotypes that seem to be carved out from the landscape.

This is possible because as n (the number of dimensions) increases, the number of possible mutants that any phenotype can become also increases. The actual rate of increase depends on the model of mutation and can be linear, if we consider n to be the number of genes, or exponential, if we consider n to be the number of independent but continuous traits that characterize an organism. Once the number of possible mutants become so large that all highly fit phenotypes have, on average, another highly fit phenotype as neighbor, then percolation is assured.

More formally, we can consider the basic model where highly fit phenotypes are randomly distributed over phenotype space: any phenotype has probability p_\omega of being highly fit. Let S_m be the average size of the set of all mutants of any phenotype. For example, if n is the number of genes and the only mutations we consider are the loss-of-function of single genes, then S_m is simply n, since this is the number of genes that can be lost, and therefore is the number of possible mutants. Percolation is reached if S_m>\dfrac{1}{p_\omega}. Later extensions also consider cases where highly fit phenotypes exist in clusters and showed that percolation is still easily achievable (Gavrilets’ book Origin of Species, Gravner et al.)

I have several criticisms of the basic model. As an aside, I find criticism to be the best way we can honor any line of work, it means we see a potential worthy of a great deal of thought and improvement. I’ll list my criticisms in the following:

1) We have not a clue what p_\omega is, not the crudest ball-park idea. To grapple with this question, we must understand  what makes an admissible phenotype. For example, we certainly should not consider any combination of atoms to be a phenotype. The proper way to define an admissible phenotypes is by defining the possible operations (mutations) that move us from one phenotype to another, that is, we must define what is a mutation. If only DNA mutations are admissible operations, and if the identical DNA string produces the same phenotype in all environments (both risible assumptions, but let’s start here), then the space of all admissible phenotypes are all possible strings of DNA. Let us consider only genomes of a billion letters in length. This space is, of course, 4^{10^9}. What fraction of these combinations are highly fit? The answers must be a truly ridiculously small number. So small that if S_m\approx O(n), I would imagine that there is no way that highly fit phenotypes reach percolation.

Now, if S_m\approx O(a^n), that is a wholly different matter altogether. For example, Gravner et al. argued that a\approx 2 for continuous traits in a simple model. If n is in the tens of thousands, my intuition tells me it’s possible that higly fit phenotypes reach percolation, since exponentials make really-really-really big numbers really quickly. Despite well known evidence that humans really are terrible intuiters at giant and tiny numbers, the absence of fitness valleys becomes at least plausible. But… it might not matter, because:

2) Populations have finite size, and evolution moves in finite time. Thus, the number of possible mutants that any phenotype will in fact explore is linear in population size and time (even if those that it can potentially explore is much larger). Even if the number of mutants, S_m grows exponentially with n, it doesn’t matter if we never have enough population or time to explore that giant number of mutants. Thus, it doesn’t matter that highly fit phenotypes form a percolating cluster, if the ridges that connect peaks aren’t thick enough to be discovered. Not only must there be highly-fit neighbors, but in order for evolution to never have to cross fitness valleys, highly-fit neighbors must be common enough to be discovered. Else, if everything populations realistically discover are low fitness, then evolution has to cross fitness valleys anyway.

How much time and population is realistic? Let’s consider bacteria, which number in the 5\times 10^{30}. In terms of generation time, let’s say they divide once every twenty minutes, the standard optimal laboratory doubling time for E. Coli. Most bacteria in natural conditions have much slower generation time. Then if bacteria evolved 4.5 billion years ago, we have had approximately 118260000000000, or ~1.2\times 10^{14} generations. The total number of bacteria sampled across all evolution is therefore on the order of 6\times 10^{44}. Does that sound like a large number? Because it’s not. That’s the trouble with linear growth. Against 4^{10^9}, this is nothing. Even against 2^{10000} (where we consider 10000 to be n, the dimension number), 6\times 10^{44} is nothing. That is, we simply don’t have time to test all the mutants. Highly fit phenotypes better make up more than \dfrac{1}{6\times 10^{44}} of the phenotype space, else we’ll never discover it. Is \dfrac{1}{6\times 10^{44}} small? Yes. Is it small enough? I’m not sure. Possibly not. In any case, this is the proper number to consider, not, say, 2^{10000}. The fact that S_m\approx O(a^n) is so large is a moot point.

3) My last criticism I consider the most difficult one for the model to answer. The holey adaptive landscapes model does not take into account environmental variation. To a great extent, it confuses the viable with the highly fit. In his book, Gavrilets often use the term “viable”, but if we use the usual definition of viable — that is, capable of reproduction, then clearly most viable phenotypes are not highly fit. Different viable phenotypes might be highly fit under different environmental conditions, but fitness itself has little meaning outside of a particular environment.

A straightforward inclusion of environmental conditions into this model is not easy. Let us consider the basic model to apply to viable phenotypes, that is, strings of DNA that are capable of reproduction, under some environment. Let us say that all that Gavrilets et al. has to say are correct with respect to viable phenotypes, that they form a percolating cluster, etc. Now, in a particular environment, these viable phenotypes will have different fitness. If we further consider only the highly fit phenotypes within a certain environment, for these highly fit phenotypes to form a percolating cluster, it would mean we would have to apply the reasoning of the model a second time. It would mean that all viable phenotypes must be connected to so many other viable phenotypes that among them would be another highly fit phenotype. Here, we take “highly fit” to be those viable phenotypes that have relative fitness greater than 1-\epsilon, where the fittest phenotype has relative fitness 1. This further dramatizes the inability of evolution to strike on “highly fit” phenotypes through a single mutation in realistic population size and time, since we must consider not p_\omega, but p_v\times p_\omega, where p_v is the probability of being viable and p_\omega is the probability of being highly fit. Both of these probabilities are almost certainly astronomically small, making the burden on the impoverishingly small number of 6\times 10^{44} even heavier.

It’s my belief, then, that in realistic evolution with finite population and time, fitness valleys nevertheless have to be crossed. Eithere there are no highly fit phenotypes a single mutation away, or if such mutations exist, then the space of all possible mutations is so large as to be impossible to fully sample with finite population and time. The old problem of having to cross fitness valleys is not entirely circumvented by the holey adaptive landscapes approach.

Next Thursday, I will seek to hijack this model for my own uses, as a model of macroevolution.

Public goods and prisoner’s dilemma games are usually equivalent

This is a old note showing that Choi and Bowles’ [CB07] public goods (PG) game is equivalent to the classic prisoner’s dilemma (PD) game.

Public goods

To summarize, Choi and Bowles’ PG (and most other PGs) is as follows:

  1. each cooperator contributes some amount c to the public pool
  2. all contributions are summed together and multiplied by some constant factor r to get a public good of value b.
  3. the public good b is divided among all n members (both cooperators and defectors) of the group equally, resulting in each agent receiving \frac{b}{n}.

From this definition, and the assumption that k of the n group members are cooperators (n - k are defectors), we can arrive at the following payoffs for cooperators (P_C) and defectors (P_D):

P_C = \frac{b}{n} - {c}

P_D = \frac{b}{n}

We know that b is the sum of all contributions (kc) times some constant r. Therefore, b = rkc, and we can rewrite the previous equations as:

P_C = \frac{rkc}{n} - {c} = \frac{c}{n}(rk - n)

P_D = \frac{rkc}{n} = \frac{c}{n}(rk)

Prisoner’s dilemma

A popular formulation of prisoner’s dilemma is based on the cost of giving (\gamma) and benefit of receiving (\beta) (Pardon the Greek notation, I want to reserve c and b for PG). In this version, when an agent cooperates, she pays a cost \gamma in order to give a benefit \beta to the person she is interacting with; a defector pays nothing and gives nothing to his partner. If we have n agents interacting (such that each agent plays a PD with each other agent, and themselves), and of those n agents, k are cooperators then we can construct the payoffs for cooperators (P_C) and defectors (P_D) as follows:

P_C = k\beta - n\gamma

A cooperator receives a benefit from every cooperator (k\beta) but also pays a cost to every agent she interacts with (n\gamma). A defector on the other hand, only benefits and pays no cost:

P_D = k\beta

Now, if we set the cost of cooperation \gamma = \frac{c}{n}, and the benefit \beta = \frac{rc}{n} and substitute into the above equations, we get:

P_C = k\frac{rc}{n} - n\frac{c}{n} = \frac{c}{n}(rk - n)

P_D = k\frac{rc}{n} = \frac{c}{n}(rk)


Now, if we simply look at the last two equations for PG and PD, we might conclude equality. However, there is a subtle distinction. The PG game can be characterized by two constant parameters c and r; PD is characterized by one constant parameters (r) and a dynamic one that depends on group population, \frac{c}{n}. A dynamic game can potentially add an extra level of unneeded complexity, however, I will show that in the case of Choi and Bowles PG (and most PG-type games) there is no extra complexity added by interpreting the game as a PD.

The reason that no complexity is added, is in how payoffs are interpreted In Choi and Bowles’ own words:

they reproduce in proportion to their share of the group’s total payoffs

When something is `in proportion’, that means if there is a common factor that is shared by all individuals, then it can be set to 1. As we can see from our presentation of the PG and PD equations, all individuals share \frac{c}{n} in common; thus we can replace it by 1 without effecting the dynamics of reproduction (which is what we really care about). Thus, the PG/PD equations become:

P_C = rk - n

P_D = rk

And in the PD game we are safe to replace the cost of cooperating by 1 (instead of \frac{c}{n}), thus eliminating the variable parameter without effecting the dynamics of reproduction. Thus PD and PG really are equivalent, and we are safe to apply our knowledge of how prisoner’s dilemma games are played to the Choi and Bowles simulations.

The only subtle point left, is the presence of self-interaction. As we will see in a future post, that can prove to be a surprisingly powerful feature.


[CB07] J.-K. Choi and S. Bowles [2007] “The Coevolution of Parochial Altruism and War” Science 318(5850): 636-640

Evolving cooperation at TEDxMcGill 2009

For me, one of the highlights of working on EGT has been the opportunity to present it to the general public. As a shameless plug and a way to start off video Saturdays, I decided to post a link to my TEDxMcGill talk on evolving cooperation. This was from the first TEDxMcGill in 2009:

I think this is the first time I used the knitters’ dilemma as an explanation for PD, which has become my favorite way of introducing the game. If you want to read a more technical overview of the graph you see on the second to last slide, then it is discussed in ref.[KS11]. If you want the comic at the end of the slides, it is xkcd’s “Purity”.

More great TEDxMcGill talks are available here and I recommend checking all of them out. Check back next Saturday for another EGT-related video!


[KS11] A. Kaznatcheev and T.R. Shultz [2011] “Ethnocentrism maintains cooperation, but keeping one’s children close fuels it.” In Proceedings of the 33rd annual conference of the cognitive science society. [pdf]

Holey adaptive landscapes

Hello world :-)

My research interests has veered off pure EGT, but my questions still center around evolution — particularly the evolution of complex systems that are made up of many small components working in unison.  In particular, I’ve been studying Gavrilets et al. ‘s model of holey fitness landscapes, I think it’s a model with great potential for studying macroevolution, or evolution on very long timescales. I’m not the first one to this idea, of course — Arnold and many others have seen the possible connection also, although I think of it in a rather different light.

In this first post, I will give a short summary of this model, cobbled together from several papers and Gavrilets’ book, the Origin of Species. The basic premise is that organisms can be characterized by a large number of traits. When we say large, we mean very large — thousands or so. Gavrilets envisions this as being the number of genes in an organism, so tens of thousands. The important thing is that each of these traits can change independently of other ones.

The idea that organisms are points in very high dimensional space is not new, Fisher had it in his 1930 classic Genetical Theory of Natural Selection, where he used this insight to argue for micromutationism — in such high dimensional space, most mutations of appreciable size are detrimental, so Fisher argued that most mutations must be small (this result was later corrected by Kimura, Orr and others, who argued that most mutations must be of intermediate size, since tiny mutations are unlikely to fix in large populations).

However, even Fisher didn’t see another consequence of high-dimensional space, which Gavrilets exploited mercilessly. The consequence is that in high-enough dimensional space, there is no need to cross fitness valleys to move between one high fitness phenotype to another; all high fitness genotypes are connected. This is because connectivity is exceedingly easy in high dimensional space. Consider two dimensions, to get from one point to another, there are only two directions to move in. Every extra dimension offers a new option for such movement, that’s why there’s a minimum dimensionality to chaotic behavior — we can’t embed a strange attractor in a two dimensional phase plane, since trajectories can’t help but cross each other. Three dimensions is better, but n-dimensional space, where n is in the tens of thousands — that’s really powerful stuff.

Basically, every phenotype — every point in n-D space, is connected to a huge number of other points in n-D space. That is, every phenotype has a huge number of neighbors. Even if the probability of being a highly fit organism is exceedingly small, chances are high that one would exist among this huge number of neighbors. We know that if each highly fit phenotype is, on average, connected to another highly fit phenotype (via mutation), then the percolation threshold is reached where almost all highly fit phenotypes are connected in one giant connected component. In this way, evolution does not have to traverse fitness minima.

If we consider mutations to be point mutations of genes, then mutations can be considered to be a Manhattan distance type walk in n-D space. That’s just a fancy way of saying that we have n genes, and only one can be changed at a time. In that case, the number of neighbors any phenotype has is n, and if the probability of being highly fit is better than 1/n, then highly fit organisms are connected. This is even easier if we consider mutations to be random movements in n-D space. That is, if we consider an organism to be characterized by \mathbf{p}=(p_1, p_2, ... p_n), where p_i is the i^{th} trait, and a mutation from \mathbf{p} results in \mathbf{p_m}=(p_1+\epsilon_1, ... p_n+\epsilon_n), such that \epsilon_i is a random small number that can be negative, and the Euclidean distance between \mathbf{p_m} and \mathbf{p} is less than \delta, where \delta is the maximum mutation size, then the neighbors of \mathbf{p} fill up the volume of a ball of radius \delta around \mathbf{p}. The volume of this ball grows exponentially with n, so even a tiny probability of being highly fit will find some neighbor of \mathbf{p} that is highly fit, because of the extremely large volume even for reasonably sized n.

The fact that evolution may never have to cross fitness minima is extremely important, it means that most of evolution may take place on “neutral bands”. Hartl and Taube had foreseen this really interesting result. Gavrilets mainly used this result to argue for speciation, which he envisions as a process that takes place naturally with reproductive isolation and has no need for natural selection.

Several improvements over the basic result have been achieved, mostly in the realm of showing that even if highly fit phenotypes are highly correlated (forming “highly fit islands” in phenotype space), the basic result of connectivity nevertheless holds (i.e. there will be bridges between those islands). Gavrilets’ book  summarizes some early results, but a more recent paper (Gravner et al.) is a real tour-de-force in this direction. Their last result shows that the existence of “incompatibility sets”, that is, sets of traits that destroy viability, nevertheless does not punch enough holes in n-D space to disconnect it. Overall, the paper shows that even with correlation, percolation (connectedness of almost all highly fit phenotypes) is still the norm.

Next Thursday, I will detail some of my own criticisms to this model and its interpretation. The week after next, I will hijack this model for my own purposes and I will attempt to show that such a model can display a great deal of historical contingency, leading to irreversible, Muller’s Ratchet type evolution that carries on in particular directions even against fitness considerations. This type of model, I believe, will provide an interesting bridge between micro and macroevolution.