Irreversible evolution with rigor

We have now seen that man is variable in body and mind; and that the variations are induced, either directly or indreictly, by the same general causes, and obey the same general laws, as with the lower animals.

— First line read on a randomly chosen page of Darwin’s The Descent of Man, in the Chapter “Development of Man from some Lower Form”. But this post isn’t about natural selection at all, so that quote is suitably random.

The intuition of my previous post can be summarized in a relatively inaccurate but simple figure:

In this figure, the number of systems is plotted against the number of components. As the number of components increase from 1 to 2, the number of possible systems greatly increase, due to large size of the space of all components (\mathbf{C}). The number of viable systems also increase, since I have yet to introduce a bias against complexity. In the figure, blue are the viable systems, while dashed lines for the 1-systems represent the space of unviable 1-systems.

If we begin at the yellow dot, an addition operation would move it to the lowest red dot. Through a few mutations — movement through the 2-system space — the process will move to the topmost red dot. At this red dot, losing a component is impossible, since losing a component would make it unviable. To lose a component, it would have to back mutate to the bottommost red dot, an event that, although not impossible, is exceedingly unlikely if \mathbf{C} is sufficiently large. This way, the number of components will keep increasing.

The number of components won’t increase without bound, however, as I said in my last post, once 1-(1-p_e)^n is large, there is enough arrows emanating from the top red dot (instead of the one arrow in the previous figure) that one of them is likely to hit the viable blues in the 1-systems. At that point, this particular form of increase in complexity will cease.

I’d like to sharpen this model with a bit more rigor. First, however, I want to show a naive approach that doesn’t quite work, at least according to the way that I sold it.

Consider a space of systems \mathbf{S} made up linearly arranged components drawn from \mathbf{C}. Among \mathbf{S}  there are viable systems that are uniformly randomly distributed throughout \mathbf{S}; any S\in\mathbf{S} has a tiny probability p_v of being viable. There is no correlation among viable systems, p_v is the only probability we consider. There are three operations possible on a system S: addition, mutation, and deletion. Addition adds a randomly chosen component from \mathbf{C} to the last spot in S (we will see that the spot is unimportant). Deletion removes a random component from S. Mutation mutates one component of S to another component in \mathbf{C} with uniformly equal probability (that is, any component can mutate to any other component with \dfrac{1}{|\mathbf{C}|-1} probability). Each operation resets S and the result of any operation has p_v of being viable.

Time proceeds in discrete timesteps, at each timstep, the probability of addition, mutation, and deletion are p_a, p_m and p_d=1-p_a-p_m respectively. Let the system at time t be S_t. At each timestep, some operating is performed on S_t, resulting in a new system, call it R_t. If R_t is viable, then there is a probability p_n that S_{t+1}=R_t, else S_{t+1}=S_t. Since the only role that p_n plays is to slow down the process, for now we will consider p_n=1.

Thus, if S=C_1C_2...C_n:

Removal of C_i results in C_1C_2...C_{i-1}C_{i+1}...C_n,

Addition of a component B results in C_1C_2...C_nB

Mutation of a component C_i to another component B results in C_1C_2...C_{i-1}BC_{i+1}...C_n

Let the initial S be S_0=C_v, where C_v is viable.

Let p_v be small, but \dfrac{1}{p_v}<|\mathbf{C}|.

The process begins on C_v, additions and mutations are possible. If no additions happen, then in approximately \dfrac{1}{p_m\cdot p_v} time, C_v mutates to another viable component, B_v. Let’s say this happens at time t. Since p_n=1, S_{t+1}=B_v. However, since this changes nothing complexity-wise, we shall not consider it for now.

A successful addition takes approximates \dfrac{1}{p_a\cdot p_v} time. Let this happen at t_1. Then at t=t_1+1, we have S_{t_1+1}=C_vC_2.

At this point, let us consider three possible events. The system can lose C_v, lose C_2, or mutate C_v. Losing C_2 results in a viable C_v, and the system restarts. This happens in approximately \dfrac{2}{p_d} time. This will be the most common event, since the chance of resulting in a viable C_2 or going through mutation to become a viable C_3C_2 are both very low. In fact, C_vC_2 must spend \dfrac{2}{p_mp_v} time as itself before it is likely to discover a viable C_3C_2 through mutation, or \dfrac{2}{ p_dp_v} before it discovers a viable C_2. The last event isn’t too interesting, since it’s like resetting, but with a viable C_2 instead of C_v, which changes nothing (this lower bound is also where Gould’s insight comes from). Finding C_3C_2 is interesting, however, since this is potentially the beginning of irreversibility.

Since we need \dfrac{2}{p_mp_v} time as C_vC_2 to discover C_3C_2, but each time we discover C_vC_2, it stays that way on average only \dfrac{2}{p_d} time, we must discover C_vC_2 \dfrac{p_d}{p_mp_v} times before we have a good chance of discovering a viable C_3C_2. Since it takes \dfrac{1}{p_a\cdot p_v} for each discovery of a viable C_vC_2, in total it will take approximately

\dfrac{1}{p_a p_v}\cdot\dfrac{p_d}{p_mp_v}=\dfrac{p_d}{p_ap_mp_v^2}

timsteps before we successfully discover C_3C_2. Phew. For small p_v, we see that it takes an awfully long time before any irreversibility kicks in.

Once we discover a viable C_3C_2, there is 1-(1-p_v)^2 probability that at least one of C_3 and C_2 are viable by themselves, in which case a loss can immediately kick in to restart the system again at a single component. The number of timesteps before we discover a viable C_3C_2 in which neither are viable by themselves is:

\dfrac{p_d}{p_ap_mp_v^2(1-(1-p_v)^2)} .

Unfortunatly this isn’t quite irreversibility. Now I will show that the time it takes for C_3C_2 to reduce down to a viable single component is on the same order as what it takes to find viable C_3C_4C_5 or C_4C_2C_5,  in which all single deletions (for C_3C_4C_5, the single deletions are: C_4C_5, C_3C_5, and C_3C_4) are all unviable.

We know that C_3 and C_2 are unviable on their own. Thus, to lose a component viably, C_3C_2 must mutate to C_3C_v (or C_vC_2), such that C_3C_v (or C_vC_2) is viable and C_v is also independently viable. To reach a mutant of C_3C_2 that is viable takes takes \dfrac{1}{p_mp_v} time. The chance the mutated component will itself be independently viable is p_v. Thus, the approximate time to find one of the viable systems C_3C_v or C_vC_2 is \dfrac{1}{p_mp_v^2}. To reach C_v from there takes \dfrac{2}{p_d} time, for a total of

\dfrac{2}{p_mp_v^2p_d}

time. It’s quite easy to see that to go from C_3C_2 to a three component system (either C_3C_4C_5 or C_4C_2C_5) such that a loss of a component renders the 3-system unviable, is also on the order of \dfrac{1}{p_v^2} time. It takes \dfrac{1}{p_ap_v} to discover the viable 3-system C_3C_2C_5, it then takes \dfrac{2}{3\cdot p_mp_v} time to reach one of C_3C_4C_5 or C_4C_2C_5 (two thirds of all mutations will hit either C_3 or C_3, of these mutation, p_v are viable). Each time a viable 3-system is discovered, the system tends to stay there \dfrac{3}{p_d} time. We must therefore discover viable 3-systems \dfrac{2p_d}{3\cdot 9p_mp_v} times before we have a good chance of discovering a viable 3-system that is locked-in and cannot quickly lose a component, yet remain viable. In total, we need

\dfrac{2p_d}{9p_mp_ap_v^2}

time. Since p_m, p_a, p_d are all relatively large numbers (at least compared to p_v), there is no “force” for the evolution of increased complexity, except the random walk force.

In the next post, I will back up statements with simulations and see how this type of processes allows us to define different types of structure, some of which increases in complexity.

Fewer Friends, More Cooperation

Cooperation is fundamental to all social and biological systems. If cells did not cooperate, multi-cellular organisms would never have evolved [1]. If people did not cooperate, there would be no nation states [2]. But this wide-scale cooperation is somewhat of a mystery from the perspective of Darwinian evolution, which would seem to favor competition for scarce resources and reproductive success over cooperation. Cooperation incurs a cost to provide a benefit elsewhere. Indeed, a basic finding in agent-based computer simulations with unstructured populations is that evolution favors defection over cooperation [3]. As a result, there has been intensive research into spatially structured populations, attempting to explain the pervasive cooperation seen in nature. Under certain realistic spatial conditions, an agent is more likely to encounter members of its own gene pool than would be expected by chance; and this allows cooperation to evolve [4].

A particularly important study in this tradition is that of Ohtsuki and colleagues [5] at Harvard’s productive Program for Evolutionary Dynamics. Their complex mathematical derivations and computer simulations conveniently conform to a rather simple rule: that evolution favors cooperation if the benefit of receiving cooperation b divided by the cost c of giving it exceeds the average number of neighbors k in the population. Or, b/c > k, exactly parallel to Hamilton’s famous rule where the b/c ratio had to exceed relatedness r in order for cooperation to thrive [6].

Ohtsuki et al.’s simulations find that this rule holds in a wide variety of graph structures: lattices, cycles, random regular graphs, random graphs, and scale-free networks. Square lattices involve either von Neumann (k = 4) or Moore (k = 8) neighbors. Cycles involve a circular arrangement of agents where k = 2. In random regular graphs, the links between agents are random except that every agent has an equal number of links (k). Random graphs are similar except that agents have an average of k links, rather than exactly k. Scale-free networks are hub-like graphs generated according to the method of preferential attachment: an agent links with others in proportion to the other’s connectivity [7].

Similar results obtain for two different reproductive schemes: death-birth and imitation. For death-birth updating, in each time step a random individual is chosen to die, and its neighbors compete for the empty site in proportion to their fitness. At each cycle of imitation updating, a random agent keeps its own strategy or imitates a neighbor’s strategy proportional to their fitness. In all cases, fitness is determined by the outcome of each agent’s interactions with its neighbors.

Death-birth updating: Image from Ohtsuki et al. 2006

In this graph [5], a blue co-operator competes with a red defector for newly-freed location via death-birth updating. The co-operating candidate’s fitness is 2b-4c because it receives cooperation from two neighboring co-operators and gives cooperation to four neighbors. The defecting candidate’s fitness is b because it receives cooperation from one co-operator and gives no cooperation.

For imitation updating, cooperation evolves as long as b/c > k + 2, the plus 2 because each agent is effectively its own neighbor.

The last sentence of the Ohtsuki paper [5] nicely summarizes the results: “The fewer friends I have, the more strongly my fate is bound to theirs.” This derives from the effect of k, which varies from 2-10. As k decreases, it is exceeded by smaller values of the b/c ratio.

In a very crowded literature, this paper is particularly notable for including both simulations and mathematical analysis; in effect, the simulation results provide empirical confirmation of the mathematics. Typical studies use only one of these methods, leaving readers to wonder whether the results would generalize to parameter settings other than those used in simulations, or whether the mathematical analysis was done properly or can predict empirical simulation results. The paper is also notable for including a fairly wide variety of graph structures. More typically, one sees results for only one particular graph, most often a square lattice. In all of these ways, the Ohtsuki et al. paper serves as inspiration for future theoretical work on evolution.

References

1.           Axelrod, R. and W.D. Hamilton, The evolution of cooperation. Science, 1981. 211: p. 1390-1396.

2.           Wedekind, C. and M. Milinski, Cooperation through image scoring in humans. Science, 2000. 228: p. 850-852.

3.           Nowak, M.A., Evolutionary dynamics2006, Cambridge, MA: Harvard University Press.

4.           Lieberman, E., C. Hauert, and M.A. Nowak, Evolutionary dynamics on graphs. Nature, 2005. 433: p. 312-316.

5.           Ohtsuki, H., Hauert, C., Lieberman, E., & Nowak, M. (2006). A simple rule for the evolution of cooperation on graphs and social networks Nature, 441 (7092), 502-505 DOI: 10.1038/nature04605

6.           Hamilton, W.D., The genetical evolution of social behaviour, I. Journal of Theoretical Biology, 1964. 7: p. 1-16.

7.           Santos, F.C. and J.M. Pacheco, Scale-free networks provide a unifying framework for the emergence of cooperation. Physical Review Letters, 2005. 95.

Presentation on evolutionary game theory and cognition

Last week Julian sent me an encouraging email:

did you know that you come up first in google for a search “evolutionary theory mcgill university”? You beat all the profs!

The specific link he was talking about was to my slides from the first time I gave a guest lecture for Tom’s Cognitive Science course in 2009. Today, I gave a similar lecture again; my 3rd year in a row giving a guest lecture for PSYC532. The slides are available here.

I am very happy Tom invited me. It is always fun to share my passion for EGT with students, and I like motivating the connections to cognition. As if often the case, some of the questions during the presentation got me thinking. A particular question I enjoyed, was along the lines of:

If humanitarians cooperate with everyone, and ethnocentrics only cooperate with in-group, then how can we have lower levels of cooperation when the world is dominated by humanitarians?

This was in reference to a result I presented in [Kaz10] about the decrease in cooperative interactions as the cognitive costs of ethnocentrism increases. In particular, even though ethnocentrics are replaced by humanitarians in the population, we don’t see an increase in the proportion of cooperative interactions. In fact, it triggers a decrease in the proportion of cooperative interactions.

I started with my usual answer of the humanitarians allowing more selfish agents to survive, but then realized a second important factor. When the ethnocentric agents are a minority, they no longer form giant same-tag clusters, and are thus much more likely to be defecting (since they are meeting agents of other tags) than cooperating. Thus, the sizable minority of ethnocentrics tend to defect and decrease the proportion of cooperation when living among a majority of humanitarians. On the other hand, when the ethnocentrics are in the majority they are in same-tag clumps and thus tend to cooperate. Of course, I should more closely analyze the simulation data to test this story.

Another attentive student caught a mistake on slide 14 (page 19 of pdf). I have left it for consistency, but hopefully other attentive readers will also notice it. Thank you for being sharp and catching an error I’ve had in my slides for 2 or 3 years now!

To all the students that listened and asked great questions: thank you! If you have any more queries please ask them in the comments. To everyone: how often do you get new insights from the questions you receive during your presentations?

References

[Kaz10] Kaznatcheev, A. (2010) “The cognitive cost of ethnocentrism.” Proceedings of the 32nd annual conference of the cognitive science society. [pdf]

Irreversible evolution

Nine for mortal men doomed to die.

In the last post I wrote about the evolution of complexity and Gould’s and McShea’s approaches to explaining the patterns of increasing complexity in evolution. That hardly exhausts the vast multitude of theories out there, but I’d like put down some of my own thoughts on the matter, as immature as they may seem.

My intuition is that if we chose a random life-form out of all possible life-forms, truly a random one — without respect to history, the time it takes to evolve that life-form, etc, then this randomly chosen life form will be inordinately complex, with a vast array of structure and hierarchy. I believe this because there are simply many more ways to be alive if one is incredibly complex, there’s more ways to arrange one’s self. This intuition gives me a way to define an entropic background such that evolution is always tempted, along or against fitness considerations, to relax to high entropy and become this highly complex form of life.

I think this idea is original, at least I haven’t heard of it yet elsewhere — but in my impoverished reading I might be very wrong, as wrong as when I realized that natural selection can’t optimize mutation rates or evolvability (something well known to at least three groups of researchers before me, as I realized much later). If anyone knows someone who had this idea before, let me know!

I will try to describe how I think this process might come about.

Consider the space of all possible systems, \mathbf{S}. Any system S\in\mathbf{S} is made up of components, chosen out of the large space \mathbf{C}. A system made out of n components I shall call an n-system. Of the members of \mathbf{S}, let there be a special property called “viability”. We will worry later about what exactly viability means, for now let’s simply make it an extremely rare property, satisfied by a tiny fraction, 0<p_v\ll1, of \mathbf{S}.

At the beginning of the process, let there be only 1-systems, or systems of one component.  If \mathbf{C} is large enough, then somewhere in this space is at least one viable component, call this special component C_v. Somehow, through sheer luck, the process stumbles on C_v. The process then dictates some operations that can happen to C_v. For now, let us consider three processes: addition of a new component, mutation of the existing component, and removal of an existing component. The goal is to understand how these three operations affect the evolution of the system while preserving viability.

Let us say that viability is a highly correlated attribute, and systems close to a viable system is much more likely to be viable than a randomly chosen system. We can introduce three probabilities here, one for the probability of viability upon the addition of a new component, upon the removal of an existing component, and upon the mutation of an existing component. For now, however, since the process is at at a 1-system, removal of components cannot preserve viability — as Gould astutely observed. Thus, we can consider additions and mutations. For simplicity I will consider only one probability, p_e, the probability of viability upon an edit.

It turns out that two parameters, |\mathbf{C}| (the size or cardinality of \mathbf{C}) and p_e, are critical to the evolution of the system. There are two types of processes that I’m interested in, although there are more than what I list below:

1) “Easy” processes: |\mathbf{C}| is small and p_e is large. There are only a few edits / additions we can make to the system, and most of them are viable.

2) “Hard” processes: |\mathbf{C}| is very large and p_e is small, but not too small. There are many edits possible and only a very small fraction of these edits are viable. However, p_e is not so small that none of these edits are viable. In fact, p_e is large enough that not only some edits are viable, but also these edits can be discovered in reasonable time and population size, once we add these ingredients to this model (not yet).

The key point is that easy processes are reversible and hard processes are not. Most of existing evolutionary theory so far as dealt with easy processes, which leads to a stable optimum driven only by environmental dictates of what is fittest, because the viable system space is strongly connected. Hard processes, on the other hand, have a viable system space that is connected — but very sparsely so. This model really is an extension of Gavrilets’ models, which is why I spent so much time reviewing them!

Now let’s see how a hard process proceeds. It’s actually very simple: the C_v either mutates around to other viable 1-systems, or adds a component to become a viable 2-system. By the definition of a hard process, these two events are possible, but might take a bit of time. Let’s say we are at a 2-system, C_vC_2. Mutations of the two system might also hit a viable system. Sooner or later, we will hit a viable C_3C_2 as a mutation of C_vC_2. At this point, it’s really hard for C_3C_2 to become a 1-system. It needs to have a mutation back to C_vC_2 and then a loss to C_v. This difficulty is magnified if we hit C_iC_2 as C_3C_2 continues to mutate, C_i might be a mutation neighbor to C_3 but not C_v. Due to the large size of the set \mathbf{C}, reverse mutation to C_v becomes virtually impossible. On the other hand, let’s say we reached C_iC_j. Removing a component results in either C_i or C_j. The probability that at least one of them is viable is 1-(1-p_e)^2, which for p_e very small, is still small. Thus, while growth in size is possible, because a system can grow into many, many different things, reduction is size is much more difficult, because one can only reduce into a limited number of things. Since most things are not viable, reduction is much more likely to result in a unviable system. This isn’t to say reduction never happens or is impossible, but overall there is a very strong trend upwards.

All this is very hand waving, and in fact a naive formalization of it doesn’t work — as I will show in the next post. But the main idea should be sound: it’s that reduction of components is very easy in the time right after the addition of a component (we can just lose the newly added component), but if no reduction happens for a while (say by chance), then mutations lock the number of components in. Since the mutation happened in a particular background of components, the viability property after mutation is true only with respect to that background. Changing that background through mutation or addition is occasionally okay, because there is a very large space things that one can grow or mutate in to, but all the possible systems that one can reduce down to may be unviable. For a n-system, there are n possible reductions, but |\mathbf{C}| possible additions and |\mathbf{C}-1|\cdot n possible mutations. For as long as |\mathbf{C}| \gg n, this line of reasoning is possible. In fact, it is possible until 1-(1-p_e)^n becomes large, at which point the probability that the probability that a system can lose a component and remain viable becomes significant.

Phew. In the next post I shall try to tighten this argument.

Replicator dynamics and cognitive cost of agency

During the June 3rd Friday meeting of the LNSC, Akash Venkat asked a fun question of how simple it is to remember your strategy. I interpreted this as a question about how difficult it is to have a deterministic or low-entropy strategy. In other words, a question about the fitness of deterministic vs. randomized strategies. Last week I expressed my skepticism towards randomized strategies for ethnocentrism, this post is meant as a potential counter-balance for ideas we can use to look at randomized strategies.

After the LNSC meeting as I was on the train back to Waterloo, I did some calculations to see how to start thinking about this question. In particular, how to properly tax the cognition associated with behaving non-randomly.

The ability to perform actions (as opposed to randomly floating around in an environment) can be see as the key quality of agency. In an EGT setting, we can capture this idea by saying the strategies that are close to random have low agency, and strategies that are close to deterministic have high agency. Having high energy requires more sophisticated cognition on the part of the agent, and thus we can associate a cost -k with having high agency. Equivalently we can think of giving a bonus of k to low agency players. This allows us to start looking at a toy model right away. Let us consider a general cooperator-defector game:

\begin{pmatrix}  1 & U \\  V & 0  \end{pmatrix}

For this game we have the standard deterministic strategies C and D, and we will now add the random strategy R which with a 50% probability plays C and otherwise plays D. Thus R captures the idea of the least possible agency, and we give it a bonus k to fitness. Letting p be the proportion of C agents, and q the proportion of R agents, we can write down the utilities for all three strategies:

\begin{aligned}  U(C) & = (p + q/2) + (1 - p - q/2)U \\  &= (p + q/2)(1 - U) + U \\  U(D) &= (p + q/2)V \\  U(R) &= \frac{U(C) + U(D)}{2} + k \\  &= (p + \frac{q}{2})(\frac{1 - U + V}{2}) + \frac{U}{2} + k  \end{aligned}

From these equations, we can see that U(R) > U(C),U(D) if k > k^* where:

\begin{aligned}  k^* & = \frac{|U(C) - U(D)|}{2} \\  & = |(p + \frac{q}{2})(\frac{1 - (U + V)}{2}) + \frac{U}{2}|  \end{aligned}

Thus, an agent is not willing to pay more than k^* in order to have agency, because whenever they pay more than k^* they will be dominated by the non-agent R strategy. Note that k^* is a function of p and q, thus it corresponds to the most agents are willing to pay in a given population distribution. To eliminate the dependence on p and q we can consider a population with q arbitrarily close to 1. This will give us a maximum value of k for which
R is not ESS:

k^{\text{ESS}} = \frac{|1 + U - V|}{4}

In other words, if we are really close to the line in game space given by V = 1 + U then agency will evolve only for very small values of k. Thus, we can think of this as the no agency line.

Although at first sight the non-agent strategy being 50%-50% seems natural, it is in fact arbitrary. If we are taking the viewpoint that the non-agent players are simply allowing the environment to pick the action for them, then there is no reason to assume that the environment is unbiased between C and D. It is very reasonable to believe that action C might be more difficult than action D (or vice-versa) and the environment will have a non 50%-50% chance of selection one or the other for the player.

We could introduce a new parameter r for the environments probability to pick C (so in the previous discussion we had r = 0.5) and parametrize the environment by 3 variables, U, V, and r. It is obvious how this would modify the previous equations into:

\begin{aligned}  U(R) & = (rU(C) + (1 - r)U(D)) + k \\  & = (p + \frac{q}{2})(r(1 - U) + (1 - r)V) + rU \\  k^* & = \max((1 - r)(U(C) - U(D)), r(U(D) - U(C))) \\  k^{\text{ESS}} & = \frac{\max((1 - r)(1 + U - V),r(V - U - 1))}{2}  \end{aligned}

Note that this equation is not fundamentally different from the previous ones, and just provides a linear offset to the cognitive cost. This tells us that the natural measure of complexity should be linear in the agent’s degree of randomness. In other words, for a strategy space where an agent can evolve any
randomness factor s and the environment has a preferred randomness r the
natural choice for cognitive complexity k(s) is:

k(s) = k \max(\frac{s - r}{1 - r},\frac{r - s}{r})

Do you think this is a viable model for the cognitive cost of agency? Or should we capture agency in some other way? What do you expect will happen when we take this game over to a structured environment? I expect that unlike the well mixed case, in certain circumstances, the agents will be willing to pay for access to randomness (instead of access to determinism).

Howard Rheingold on collaboration at TED

Howard Rheingold sows some of the seeds of evolutionary game theoretic thinking at TED.

Rheingold talks about the prisoner’s dilemma, ultimatum game, and tragedy of the commons (or public good), and how they can be modified to facilitate collaboration. Do you think evolutionary game theory can give us serious insights on human cooperation? Or is it just too simple of an approximation?