Mutation-bias driving the evolution of mutation rates

In classic game theory, we are often faced with multiple potential equilibria between which to select with no unequivocal way to choose between these alternatives. If you’ve ever heard Artem justify dynamic approaches, such as evolutionary game theory, then you’ve seen this equilibrium selection problem take center stage. Natural selection has an analogous ‘problem’ of many local fitness peaks. Is the selection between them simply an accidental historical process? Or is there a method to the madness that is independent of the the environment that defines the fitness landscape and that can produce long term evolutionary trends?

Two weeks ago, in my first post of this series, I talked about an idea Wallace Arthur (2004) calls “developmental bias”, where the variation of traits in a population can determine which fitness peak the population evolves to. The idea is that if variation is generated more frequently in a particular direction, then fitness peaks in that direction are more easily discovered. Arthur hypothesized that this mechanism can be responsible for long-term evolutionary trends.

A very similar idea was discovered and called “mutation bias” by Yampolsky & Stoltzfus (2001). The difference between mutation bias and developmental bias is that Yampolsky & Stoltzfus (2001) described the idea in the language of discrete genetics rather than trait-based phenotypic evolution. They also did not invoke developmental biology. The basic mechanism, however, was the same: if a population is confronted with multiple fitness peaks nearby, mutation bias will make particular peaks much more likely.

In this post, I will discuss the Yampolsky & Stoltzfus (2001) “mutation bias”, consider applications of it to the evolution of mutation rates by Gerrish et al. (2007), and discuss how mutation is like and unlike other biological traits.

Read more of this post

Variation for supply driven evolution

I’ve taken a very long hiatus (nearly 5 years!) from this blog. I suppose getting married and getting an MD are good excuses, but Artem has very kindly let me return. And I greatly appreciate this chance, because I’d like to summarize an idea I had been working on for a while. So far, only two publication has come out of it (Xue et al., 2015a,b), but it’s an idea that has me excited. So excited that I defended a thesis on it this Tuesday. For now, I call it supply-driven evolution, where I try to show how the generation of variation can determine long-term evolution.

Evolutionary theoreticians have long known that how variation is generated has a decisive role in evolutionary outcome. The reason is that natural selection can only choose among what has been generated, so focusing on natural selection will not produce a full understanding of evolution. But how does variation affect evolution, and can variation be the decisive factor in how evolution proceeds? I believe that the answer is “frequently, yes,” because it does not actually compete with natural selection. I’ll do a brief overview of the literature in the first few posts. By the end, I hope how this mechanism can explain some forms of irreversible evolution, stuff I had blogged about five years ago.

Read more of this post

Irreversible evolution with rigor

We have now seen that man is variable in body and mind; and that the variations are induced, either directly or indreictly, by the same general causes, and obey the same general laws, as with the lower animals.

— First line read on a randomly chosen page of Darwin’s The Descent of Man, in the Chapter “Development of Man from some Lower Form”. But this post isn’t about natural selection at all, so that quote is suitably random.

The intuition of my previous post can be summarized in a relatively inaccurate but simple figure:

In this figure, the number of systems is plotted against the number of components. As the number of components increase from 1 to 2, the number of possible systems greatly increase, due to large size of the space of all components (\mathbf{C}). The number of viable systems also increase, since I have yet to introduce a bias against complexity. In the figure, blue are the viable systems, while dashed lines for the 1-systems represent the space of unviable 1-systems.

If we begin at the yellow dot, an addition operation would move it to the lowest red dot. Through a few mutations — movement through the 2-system space — the process will move to the topmost red dot. At this red dot, losing a component is impossible, since losing a component would make it unviable. To lose a component, it would have to back mutate to the bottommost red dot, an event that, although not impossible, is exceedingly unlikely if \mathbf{C} is sufficiently large. This way, the number of components will keep increasing.

The number of components won’t increase without bound, however, as I said in my last post, once 1-(1-p_e)^n is large, there is enough arrows emanating from the top red dot (instead of the one arrow in the previous figure) that one of them is likely to hit the viable blues in the 1-systems. At that point, this particular form of increase in complexity will cease.

I’d like to sharpen this model with a bit more rigor. First, however, I want to show a naive approach that doesn’t quite work, at least according to the way that I sold it.

Consider a space of systems \mathbf{S} made up linearly arranged components drawn from \mathbf{C}. Among \mathbf{S}  there are viable systems that are uniformly randomly distributed throughout \mathbf{S}; any S\in\mathbf{S} has a tiny probability p_v of being viable. There is no correlation among viable systems, p_v is the only probability we consider. There are three operations possible on a system S: addition, mutation, and deletion. Addition adds a randomly chosen component from \mathbf{C} to the last spot in S (we will see that the spot is unimportant). Deletion removes a random component from S. Mutation mutates one component of S to another component in \mathbf{C} with uniformly equal probability (that is, any component can mutate to any other component with \dfrac{1}{|\mathbf{C}|-1} probability). Each operation resets S and the result of any operation has p_v of being viable.

Time proceeds in discrete timesteps, at each timstep, the probability of addition, mutation, and deletion are p_a, p_m and p_d=1-p_a-p_m respectively. Let the system at time t be S_t. At each timestep, some operating is performed on S_t, resulting in a new system, call it R_t. If R_t is viable, then there is a probability p_n that S_{t+1}=R_t, else S_{t+1}=S_t. Since the only role that p_n plays is to slow down the process, for now we will consider p_n=1.

Thus, if S=C_1C_2...C_n:

Removal of C_i results in C_1C_2...C_{i-1}C_{i+1}...C_n,

Addition of a component B results in C_1C_2...C_nB

Mutation of a component C_i to another component B results in C_1C_2...C_{i-1}BC_{i+1}...C_n

Let the initial S be S_0=C_v, where C_v is viable.

Let p_v be small, but \dfrac{1}{p_v}<|\mathbf{C}|.

The process begins on C_v, additions and mutations are possible. If no additions happen, then in approximately \dfrac{1}{p_m\cdot p_v} time, C_v mutates to another viable component, B_v. Let’s say this happens at time t. Since p_n=1, S_{t+1}=B_v. However, since this changes nothing complexity-wise, we shall not consider it for now.

A successful addition takes approximates \dfrac{1}{p_a\cdot p_v} time. Let this happen at t_1. Then at t=t_1+1, we have S_{t_1+1}=C_vC_2.

At this point, let us consider three possible events. The system can lose C_v, lose C_2, or mutate C_v. Losing C_2 results in a viable C_v, and the system restarts. This happens in approximately \dfrac{2}{p_d} time. This will be the most common event, since the chance of resulting in a viable C_2 or going through mutation to become a viable C_3C_2 are both very low. In fact, C_vC_2 must spend \dfrac{2}{p_mp_v} time as itself before it is likely to discover a viable C_3C_2 through mutation, or \dfrac{2}{ p_dp_v} before it discovers a viable C_2. The last event isn’t too interesting, since it’s like resetting, but with a viable C_2 instead of C_v, which changes nothing (this lower bound is also where Gould’s insight comes from). Finding C_3C_2 is interesting, however, since this is potentially the beginning of irreversibility.

Since we need \dfrac{2}{p_mp_v} time as C_vC_2 to discover C_3C_2, but each time we discover C_vC_2, it stays that way on average only \dfrac{2}{p_d} time, we must discover C_vC_2 \dfrac{p_d}{p_mp_v} times before we have a good chance of discovering a viable C_3C_2. Since it takes \dfrac{1}{p_a\cdot p_v} for each discovery of a viable C_vC_2, in total it will take approximately

\dfrac{1}{p_a p_v}\cdot\dfrac{p_d}{p_mp_v}=\dfrac{p_d}{p_ap_mp_v^2}

timsteps before we successfully discover C_3C_2. Phew. For small p_v, we see that it takes an awfully long time before any irreversibility kicks in.

Once we discover a viable C_3C_2, there is 1-(1-p_v)^2 probability that at least one of C_3 and C_2 are viable by themselves, in which case a loss can immediately kick in to restart the system again at a single component. The number of timesteps before we discover a viable C_3C_2 in which neither are viable by themselves is:

\dfrac{p_d}{p_ap_mp_v^2(1-(1-p_v)^2)} .

Unfortunatly this isn’t quite irreversibility. Now I will show that the time it takes for C_3C_2 to reduce down to a viable single component is on the same order as what it takes to find viable C_3C_4C_5 or C_4C_2C_5,  in which all single deletions (for C_3C_4C_5, the single deletions are: C_4C_5, C_3C_5, and C_3C_4) are all unviable.

We know that C_3 and C_2 are unviable on their own. Thus, to lose a component viably, C_3C_2 must mutate to C_3C_v (or C_vC_2), such that C_3C_v (or C_vC_2) is viable and C_v is also independently viable. To reach a mutant of C_3C_2 that is viable takes takes \dfrac{1}{p_mp_v} time. The chance the mutated component will itself be independently viable is p_v. Thus, the approximate time to find one of the viable systems C_3C_v or C_vC_2 is \dfrac{1}{p_mp_v^2}. To reach C_v from there takes \dfrac{2}{p_d} time, for a total of


time. It’s quite easy to see that to go from C_3C_2 to a three component system (either C_3C_4C_5 or C_4C_2C_5) such that a loss of a component renders the 3-system unviable, is also on the order of \dfrac{1}{p_v^2} time. It takes \dfrac{1}{p_ap_v} to discover the viable 3-system C_3C_2C_5, it then takes \dfrac{2}{3\cdot p_mp_v} time to reach one of C_3C_4C_5 or C_4C_2C_5 (two thirds of all mutations will hit either C_3 or C_3, of these mutation, p_v are viable). Each time a viable 3-system is discovered, the system tends to stay there \dfrac{3}{p_d} time. We must therefore discover viable 3-systems \dfrac{2p_d}{3\cdot 9p_mp_v} times before we have a good chance of discovering a viable 3-system that is locked-in and cannot quickly lose a component, yet remain viable. In total, we need


time. Since p_m, p_a, p_d are all relatively large numbers (at least compared to p_v), there is no “force” for the evolution of increased complexity, except the random walk force.

In the next post, I will back up statements with simulations and see how this type of processes allows us to define different types of structure, some of which increases in complexity.

Irreversible evolution

Nine for mortal men doomed to die.

In the last post I wrote about the evolution of complexity and Gould’s and McShea’s approaches to explaining the patterns of increasing complexity in evolution. That hardly exhausts the vast multitude of theories out there, but I’d like put down some of my own thoughts on the matter, as immature as they may seem.

My intuition is that if we chose a random life-form out of all possible life-forms, truly a random one — without respect to history, the time it takes to evolve that life-form, etc, then this randomly chosen life form will be inordinately complex, with a vast array of structure and hierarchy. I believe this because there are simply many more ways to be alive if one is incredibly complex, there’s more ways to arrange one’s self. This intuition gives me a way to define an entropic background such that evolution is always tempted, along or against fitness considerations, to relax to high entropy and become this highly complex form of life.

I think this idea is original, at least I haven’t heard of it yet elsewhere — but in my impoverished reading I might be very wrong, as wrong as when I realized that natural selection can’t optimize mutation rates or evolvability (something well known to at least three groups of researchers before me, as I realized much later). If anyone knows someone who had this idea before, let me know!

I will try to describe how I think this process might come about.

Consider the space of all possible systems, \mathbf{S}. Any system S\in\mathbf{S} is made up of components, chosen out of the large space \mathbf{C}. A system made out of n components I shall call an n-system. Of the members of \mathbf{S}, let there be a special property called “viability”. We will worry later about what exactly viability means, for now let’s simply make it an extremely rare property, satisfied by a tiny fraction, 0<p_v\ll1, of \mathbf{S}.

At the beginning of the process, let there be only 1-systems, or systems of one component.  If \mathbf{C} is large enough, then somewhere in this space is at least one viable component, call this special component C_v. Somehow, through sheer luck, the process stumbles on C_v. The process then dictates some operations that can happen to C_v. For now, let us consider three processes: addition of a new component, mutation of the existing component, and removal of an existing component. The goal is to understand how these three operations affect the evolution of the system while preserving viability.

Let us say that viability is a highly correlated attribute, and systems close to a viable system is much more likely to be viable than a randomly chosen system. We can introduce three probabilities here, one for the probability of viability upon the addition of a new component, upon the removal of an existing component, and upon the mutation of an existing component. For now, however, since the process is at at a 1-system, removal of components cannot preserve viability — as Gould astutely observed. Thus, we can consider additions and mutations. For simplicity I will consider only one probability, p_e, the probability of viability upon an edit.

It turns out that two parameters, |\mathbf{C}| (the size or cardinality of \mathbf{C}) and p_e, are critical to the evolution of the system. There are two types of processes that I’m interested in, although there are more than what I list below:

1) “Easy” processes: |\mathbf{C}| is small and p_e is large. There are only a few edits / additions we can make to the system, and most of them are viable.

2) “Hard” processes: |\mathbf{C}| is very large and p_e is small, but not too small. There are many edits possible and only a very small fraction of these edits are viable. However, p_e is not so small that none of these edits are viable. In fact, p_e is large enough that not only some edits are viable, but also these edits can be discovered in reasonable time and population size, once we add these ingredients to this model (not yet).

The key point is that easy processes are reversible and hard processes are not. Most of existing evolutionary theory so far as dealt with easy processes, which leads to a stable optimum driven only by environmental dictates of what is fittest, because the viable system space is strongly connected. Hard processes, on the other hand, have a viable system space that is connected — but very sparsely so. This model really is an extension of Gavrilets’ models, which is why I spent so much time reviewing them!

Now let’s see how a hard process proceeds. It’s actually very simple: the C_v either mutates around to other viable 1-systems, or adds a component to become a viable 2-system. By the definition of a hard process, these two events are possible, but might take a bit of time. Let’s say we are at a 2-system, C_vC_2. Mutations of the two system might also hit a viable system. Sooner or later, we will hit a viable C_3C_2 as a mutation of C_vC_2. At this point, it’s really hard for C_3C_2 to become a 1-system. It needs to have a mutation back to C_vC_2 and then a loss to C_v. This difficulty is magnified if we hit C_iC_2 as C_3C_2 continues to mutate, C_i might be a mutation neighbor to C_3 but not C_v. Due to the large size of the set \mathbf{C}, reverse mutation to C_v becomes virtually impossible. On the other hand, let’s say we reached C_iC_j. Removing a component results in either C_i or C_j. The probability that at least one of them is viable is 1-(1-p_e)^2, which for p_e very small, is still small. Thus, while growth in size is possible, because a system can grow into many, many different things, reduction is size is much more difficult, because one can only reduce into a limited number of things. Since most things are not viable, reduction is much more likely to result in a unviable system. This isn’t to say reduction never happens or is impossible, but overall there is a very strong trend upwards.

All this is very hand waving, and in fact a naive formalization of it doesn’t work — as I will show in the next post. But the main idea should be sound: it’s that reduction of components is very easy in the time right after the addition of a component (we can just lose the newly added component), but if no reduction happens for a while (say by chance), then mutations lock the number of components in. Since the mutation happened in a particular background of components, the viability property after mutation is true only with respect to that background. Changing that background through mutation or addition is occasionally okay, because there is a very large space things that one can grow or mutate in to, but all the possible systems that one can reduce down to may be unviable. For a n-system, there are n possible reductions, but |\mathbf{C}| possible additions and |\mathbf{C}-1|\cdot n possible mutations. For as long as |\mathbf{C}| \gg n, this line of reasoning is possible. In fact, it is possible until 1-(1-p_e)^n becomes large, at which point the probability that the probability that a system can lose a component and remain viable becomes significant.

Phew. In the next post I shall try to tighten this argument.

Evolution of complexity

My cats are jittery. The end of the world must be near.

In this post I will outline the questions I wish to answer, the existing approaches to these questions and the intuitions I wish to formalize. Thus I’m going to break all the promises I made in the last post — but really, this should have been the last post, somehow I got sidetracked…

There is an intuition that life has gotten more complex over geological time. More hierarchies have been built, more specialization of parts, possibly even more intelligent. This intuition is not only true for life, but for societies. Governments, bureaucracy, culture, social divisions — all seem to become more complex with time. The modern division of labor is the pinnacle of this. Why?

Orthodox evolutionary theory does not have an answer to this. And well it does not, because the answer is both nasty — and, I will argue, deathly wrong. Orthodox evolutionary theory proposes that natural selection is the major driver of evolutionary change, that organisms change their forms and types because of differences in fitness. Ergo, things must have become more complex, specialized, etc. — because they are more fit. Humans, as such recent players in evolutionary history, are very complex, very specialized, and very smart — hence we are the evolutionary climax, the last of the rungs of the evolutionary ladder to perfection…

This answer is the disowned natural bastard of orthodox evolutionary theory. It is never given out in modern polite scientific discourse, and most biologists would argue militantly that it is not true. Like all disowned bastards of royalty, however, its head constantly pokes up with alarming frequency, each time with new insurgent allies looking for a coup d’etat: eugenics, racism, social darwinism, all the bad movies using evolution as deus -ex-machina (X-men!) etc. Hear hear, it seems to say, Evolution is making us better, and we should hasten her along.

I’m fully convinced that this bastard child is incredibly wrong and we ought to have its head on a pike. The trouble is, parts of its parents might have to go also. The parts that lead to this bastard child, at any rate — I will argue that the long trends of macroevolution have little to do with natural selection.

For simplicity, let’s focus on complexity alone for the moment. Current (polite, modern, scientific) discussion about the evolution of complexity hinge around if it is increasing at all — it turns out “complexity” is an awfully slippery thing to measure. McShea did a great deal of work (1991, 1996, 2005) here, although him and colleagues aren’t the only group (e.g. Heylighen 1999). In any case, there’s a vast literature on this. I would carefully submit, however, that complexity has in fact increased, since we don’t find any pre-cellular living forms (the earliest living form cannot have been a fully formed bacteria!) and even our oldest Archeabacteria are well removed from the earliest living form (say, a self-replicating RNA). Nor did anything as complex as mammals show up right at the beginning of life. Although this work on the definition and measurement of complexity is incredibly valuable, we don’t need a thermometer to tell us that boiling water is hotter than ice.

The question is therefore why. To use the answer of orthodox evolutionary theory, that natural selection drove the extinction of simple organisms and made organisms more and more complex, is intensely unsavory. It’s more than just the political and cultural distastefulness of the answer and the capacity for people to abuse this fact, I — and many others — do think it’s actually wrong. But then there must be another force outside of natural selection that can drive evolutionary trends on the geological scale. What is this other force? I’ll summarize the existing arguments — and my opinions (why else write a blog?) — below:

Gould (Gould 1997) has famously argued that complexity, on average, has nothing to do with natural selection. The increase in complexity over time, he says, is simply the result of a random walk. Sometimes complexity is good for the organism and it grows more complex. Sometimes complexity is less good and organisms grow less complex. However, for biochemical reasons, the first life forms had to be very simple. On the other hand, complexity cannot go below zero. Thus, the evolution of complexity is that of a random walk with an absorbing state of zero. The average complexity then naturally increases, but complexity itself, per se, doesn’t do anything for fitness at all — what is good for fitness is entirely environmentally dependent.

Gould thus posits no positive force for the increase in complexity. McShea, who concluded that complexity was increasing after all, considers drift to be a positive, but weak, force for the increase of evolution. With Robert Brandon, they coauthored a book arguing that this is the case. A positive review is here. The idea is that mutation is a natural driver for diversity, which is synonomous with complexity. After all, one does not mutate into the same thing that one was. Thus, in the complete absence of selection, evolution progresses into greater complexity. McShea and Brandon called this the Zero Force Evolutionary Law (hmmm… I pick up a hint of Asimov here).

I think Gould’s idea is very clever, but it contradicts empirical evidence. There are traits that seem to make its carrier fitter over a wide swath of environments; the eusociality of ants must have contributed to their dominance in the world’s ecology. Might complexity be such a trait? Probably not, considering the dominance of bacteria… but we cannot reject that complexity has an effect on fitness overall, as Gould does. Besides, according to Gould, if all of complexity is driven by environmental conditions, then all lineages should show theoretically unlimited movement in complexity. Thus, according to this theory, whenever environments favor the loss of mitochondria in eukaryotes, eukaryotes should lose them and good riddance. Unfortunately, for all strains of eukaryotes, losing mitochondria is death — regardless of the environment, so there is no strain of prokaryotes with eukaryotes as their ancestor. This is the intuition I hope to tighten later. Similarly, of the many billions of cases of cancer that has occurred throughout evolutionary history, there is no species of single-celled organism that had a amphibian, reptile, or mammalian ancestor (I’m not sure about sponges). Once the organism dies, the cancer dies (unless it’s kept alive in a lab). None of the cancer cells could revert to unicellularism, although it’s undoubtedly advantageous for them to do so. Although multicellular to unicellular evolution has certainly occurred, they don’t seem to have ever occurred in species where the viability of the organism depends on an intense and obligate integration evolved over billions of years. Thus, unlike Gould’s claim, there seem to be some plateaus of complexity that, once stepped onto, cannot be descended from.

McShea’s idea, on the other hand, I’m skeptical about. It reminds me of a flavor of mutationism and orthogenesis that has been soundly routed in the course of the history of evolutionary thought, with good reason. Natural selection is an awfully powerful force, strong enough to beat the second law of thermodynamics every time. Most mutations increase diversity, yes, but most mutations also make us closer to a ball of gas, and yet we aren’t balls of gas. The authors seem to believe that it is a gentle breeze of a force blowing in the background, such that if the selection for or against complexity averaged out to nearly zero over time, then this force is sufficient to provide a long term trend. With no mathematical model behind their reasoning, I cannot formed an informed opinion of whether this might be true — but I highly doubt it. What I think is much more likely to happen is that the evolution of complexity would be precisely as dictated by natural selection, whether it is a random walk or not, and only increased slightly at all points in time by the gentle breeze. Consider the following 2 minute drawing in GIMP:

Different prediction for the evolution of complexity

It looks terrible, I know — and the axes are unlabeled. Okay okay, x axis is time and y axis is some measure of complexity. Let’s say that natural selection is the black line, so sometimes complexity is selected up and sometimes down, but there’s no overall trend (well, there should be, since it’s a random walk with an absorbing condition — but bear with me here). Red is what I think McShea and Brandon is proposing, that there’s a gentle background force moving complexity up. But I think that the gray line is what would happen — the evolution of complexity, according to McShea and Brandon’s force, would exactly reflect natural selection, with no trend. The force would nudge complexity to be a bit higher than what it would otherwise be, but that’s it.

Wow this post has gotten long already — you can’t say very much in a post! Next post — and I won’t break any promises this time — will deal with my own thoughts on increasing complexity and its links to the holey landscapes model.

Evolving viable systems

It got cold really, really quickly today… Winter is coming  –GRRM

In my last two posts I wrote about holey adaptive landscapes and some criticisms I had for the existing model. In this post, I will motivate the hijacking of the model for my own purposes, for macroevolution :-) That’s the great thing about models, by relabeling the ingredients something else, as long as the relationship between elements of the model holds true, we may suddenly have an entirley fresh insight about the world!

As we might recall, the holey adaptive landscapes model proposes that phenotypes are points in n-dimensional space, where n is a very large number. If each phenotype was  a node, then mutations that change one phenotype to another can be considered edges. In this space for large enough n, even nodes with very rare properties can percolate and form a giant connected cluster. Gavrilets, one of the original authors of the model and its main proponent, considered this rare property to be high fitness. This way, evolution never has to cross fitness valleys. However, I find this unlikely; high fitness within a particular environment is a very rare property. If the property is sufficiently rare, then even if the nodes form a giant connected cluster, if the connections between nodes are sufficiently tenuous, then there is not enough time and population size for their exploration.

Think of a blind fruitfly banging its head within a sphere. On the wall of the sphere is pricked a single tiny hole just large enough for the fruitfly to escape. How much time will it take for the fruitfly to escape? The question clearly depends on the size of the sphere. In n-dimensions, where n is large, the sphere is awfully big. Now consider the sphere to be a single highly fit phenotype, and a hole is an edge to another highly fit phenotype. The existence of the hole is not sufficient to guarantee that the fruitfly will find the exit in finite time. In fact, even a giant pack of fruitflies — all the fruitflies that ever existed — may not be able to find it, given all the time that life has evolved on Earth. That’s how incredibly large the sphere is — the exit must not only exist, it must be sufficiently common.

The goal of this post is to detail why I’m so interested in the holey adaptive landscapes model. I’m interested in its property of historicity, the capacity to be contingent and irreversible. I will define these terms more carefully later. Gavrilets has noted this in his book, but I can find no insightful exploration of this potential. I hope this model can formalize some intuitions gained in complex adaptive systems, particularly those of evo-devo and my personal favorite pseudo-philosophical theory, generative entrenchment (also here and here). Gould had a great instinct for this when he argued for the historical contingency of evolutionary process (consider his work on gastropods, for example, or his long rants on historicity in his magnus opus — although I despair to pick out a specific section).

Before I go on, I must also rant. Gould’s contingency means that evolution is sensitive to initial conditions, yes, this does not mean history is chaotic, in the mathematical sense of chaos. Chaos is not the only way for a system to be sensitive to initial conditions, in fact, mathematical chaos is preeminently ahistorical — just like equilibriating systems, chaotic systems forget history in a hurry, in total contrast to what Gould meant, which is that history should leave an indelible imprint on all of future. No matter what the initial condition, chaotic systems settle in the same strange attractor, the same distribution over phase space. The exact trajectory depends on initial condition, yes, but because the smallest possible difference in initial condition quickly translates to a completely different trajectory, it means that no matter how you begin, you future is… chaotic. Consider two trajectories that began with very differently, the future difference between those two trajectories is no greater than two trajectories that began with the slightest possible difference. The difference in the difference of initial conditions is quickly obliviated by time. Whatever the atheist humanists say, chaos gives no more hope to free will than quantum mechanics. Lots of people seem to have gone down this hopeless route, not least of which is Michael Shermer, who, among many different places, writes here:

And as chaos and complexity theory have shown, small changes early in a historical sequence can trigger enormous changes later… the question is: what type of change will be triggered by human actions, and in what direction will it go?

If he’s drawing this conclusion from chaos theory, then the answer to his question is… we don’t know, we can’t possibly have an idea, and it doesn’t matter what we do, since all trajectories are statistically the same. If he’s drawing this conclusion from “complexity theory” — not yet a single theory with core results of any sort, then it’s a theory entirely unknown to me.

No, interesting historical contingency is quite different, we will see if the holey landscapes model can more accurately capture its essence.

Things in the holey landscapes model get generally better if we consider the rare property to be viability, instead of high fitness. In fact, Gavrilets mixes use of “viable” and highly fit, although I suspect him to always mean the latter. By viable, I mean that the phenotype is capable of reproduction in some environment, but I don’t care how well it reproduces. For ease of discussion, let’s say that viable phenotypes also reproduce above the error threshold, and there exist an environment where it is able to reproduce with absolute fitness >1. Else, it’s doomed to extinction in all environments, and then it’s not very viable, is it?

It turns out that the resultant model contains an interesting form of irreversibility. I will give the flavor here, while spending the next post being more technical. Consider our poor blind fruitfly, banging its head against the sphere. Because we consider viability instead of “high fitness”, there are now lots of potential holes in the sphere. Each potential hole is a neighboring viable phenotype, but the hole is opened or closed by the environment, which dictates whether that neighboring viable phenotype is fit.

Aha, an astute reader might say, but this is no better than Gavrilets’ basic model. The number of open holes at any point must be very small, since it’s also subject to the double filter of viability and high fitness. How can we find the open holes?

The difference is that after an environmental change, the sphere we are currently in might be very unfit. Thus, the second filter — high fitness — is much less constrictive, since it merely has to be fitter than the current sphere, which might be on the verge of extinction. A large porportion of viable phenotypes may be “open” holes, as opposed to the basic model, where only the highly fit phenotypes are open. Among viable phenotypes, highly fit ones may be rare, but those that are somewhat more fit than an exceedingly unfit phenotype may be much more common — and it’s only a matter of time, often fairly short time on the geological scale, before any phenotype is rendered exceedingly unfit. So you see, in this model evolution also did not have to cross a fitness valley, but I’m using a much more classical mechanism — peak shifts due to environmental change, rather than a percolation cluster of highly-fit phenotypes.

Now that our happier fruitfly is in the neighboring sphere, what is the chance that it will return to its previous sphere, as opposed to choosing some other neighbor? The answer is… very low. The probability of finding any particular hole has not improved, and verges on impossibility; although the probability of finding some hole is much better. Moreover, the particular hole that the fruitfly went through dictate what holes it will have access to next — and if it can’t return to its previous sphere, then it can’t go back to remake the choice. This gives the possibility of much more interesting contingency than mere chaos.

This mechanism has much in common with Muller’s Ratchet, or Dollo’s Law, and is an attempt to generalize the ratchet mechanism while formalizing what we mean, exactly, by irreversibility. I will tighten the argument next Thursday.

Criticisms of holey adaptive landscapes

:-) My cats say hello.

In my last post I wrote about holey adaptive landscapes, a model of evolution in very high dimensional space where there is no need to jump across fitness valleys. The idea is that if we consider the phenotype of organisms to be points in  n-dimensional space, where n is some large number (say, tens of thousands, as in the number of our genes), then high fitness phenotypes easily percolate even if they are rare. By percolate, I mean that most high-fitness phenotypes are connected in one giant component, so evolution from one high fitness “peak” to another does not involve crossing a valley, rather, there are only high fitness ridges that are well-connected. This is why the model is consider “holey”, highly fit phenotypes are entirely connected within the fitness landscape, but they run around large “holes” of poorly fit phenotypes that seem to be carved out from the landscape.

This is possible because as n (the number of dimensions) increases, the number of possible mutants that any phenotype can become also increases. The actual rate of increase depends on the model of mutation and can be linear, if we consider n to be the number of genes, or exponential, if we consider n to be the number of independent but continuous traits that characterize an organism. Once the number of possible mutants become so large that all highly fit phenotypes have, on average, another highly fit phenotype as neighbor, then percolation is assured.

More formally, we can consider the basic model where highly fit phenotypes are randomly distributed over phenotype space: any phenotype has probability p_\omega of being highly fit. Let S_m be the average size of the set of all mutants of any phenotype. For example, if n is the number of genes and the only mutations we consider are the loss-of-function of single genes, then S_m is simply n, since this is the number of genes that can be lost, and therefore is the number of possible mutants. Percolation is reached if S_m>\dfrac{1}{p_\omega}. Later extensions also consider cases where highly fit phenotypes exist in clusters and showed that percolation is still easily achievable (Gavrilets’ book Origin of Species, Gravner et al.)

I have several criticisms of the basic model. As an aside, I find criticism to be the best way we can honor any line of work, it means we see a potential worthy of a great deal of thought and improvement. I’ll list my criticisms in the following:

1) We have not a clue what p_\omega is, not the crudest ball-park idea. To grapple with this question, we must understand  what makes an admissible phenotype. For example, we certainly should not consider any combination of atoms to be a phenotype. The proper way to define an admissible phenotypes is by defining the possible operations (mutations) that move us from one phenotype to another, that is, we must define what is a mutation. If only DNA mutations are admissible operations, and if the identical DNA string produces the same phenotype in all environments (both risible assumptions, but let’s start here), then the space of all admissible phenotypes are all possible strings of DNA. Let us consider only genomes of a billion letters in length. This space is, of course, 4^{10^9}. What fraction of these combinations are highly fit? The answers must be a truly ridiculously small number. So small that if S_m\approx O(n), I would imagine that there is no way that highly fit phenotypes reach percolation.

Now, if S_m\approx O(a^n), that is a wholly different matter altogether. For example, Gravner et al. argued that a\approx 2 for continuous traits in a simple model. If n is in the tens of thousands, my intuition tells me it’s possible that higly fit phenotypes reach percolation, since exponentials make really-really-really big numbers really quickly. Despite well known evidence that humans really are terrible intuiters at giant and tiny numbers, the absence of fitness valleys becomes at least plausible. But… it might not matter, because:

2) Populations have finite size, and evolution moves in finite time. Thus, the number of possible mutants that any phenotype will in fact explore is linear in population size and time (even if those that it can potentially explore is much larger). Even if the number of mutants, S_m grows exponentially with n, it doesn’t matter if we never have enough population or time to explore that giant number of mutants. Thus, it doesn’t matter that highly fit phenotypes form a percolating cluster, if the ridges that connect peaks aren’t thick enough to be discovered. Not only must there be highly-fit neighbors, but in order for evolution to never have to cross fitness valleys, highly-fit neighbors must be common enough to be discovered. Else, if everything populations realistically discover are low fitness, then evolution has to cross fitness valleys anyway.

How much time and population is realistic? Let’s consider bacteria, which number in the 5\times 10^{30}. In terms of generation time, let’s say they divide once every twenty minutes, the standard optimal laboratory doubling time for E. Coli. Most bacteria in natural conditions have much slower generation time. Then if bacteria evolved 4.5 billion years ago, we have had approximately 118260000000000, or ~1.2\times 10^{14} generations. The total number of bacteria sampled across all evolution is therefore on the order of 6\times 10^{44}. Does that sound like a large number? Because it’s not. That’s the trouble with linear growth. Against 4^{10^9}, this is nothing. Even against 2^{10000} (where we consider 10000 to be n, the dimension number), 6\times 10^{44} is nothing. That is, we simply don’t have time to test all the mutants. Highly fit phenotypes better make up more than \dfrac{1}{6\times 10^{44}} of the phenotype space, else we’ll never discover it. Is \dfrac{1}{6\times 10^{44}} small? Yes. Is it small enough? I’m not sure. Possibly not. In any case, this is the proper number to consider, not, say, 2^{10000}. The fact that S_m\approx O(a^n) is so large is a moot point.

3) My last criticism I consider the most difficult one for the model to answer. The holey adaptive landscapes model does not take into account environmental variation. To a great extent, it confuses the viable with the highly fit. In his book, Gavrilets often use the term “viable”, but if we use the usual definition of viable — that is, capable of reproduction, then clearly most viable phenotypes are not highly fit. Different viable phenotypes might be highly fit under different environmental conditions, but fitness itself has little meaning outside of a particular environment.

A straightforward inclusion of environmental conditions into this model is not easy. Let us consider the basic model to apply to viable phenotypes, that is, strings of DNA that are capable of reproduction, under some environment. Let us say that all that Gavrilets et al. has to say are correct with respect to viable phenotypes, that they form a percolating cluster, etc. Now, in a particular environment, these viable phenotypes will have different fitness. If we further consider only the highly fit phenotypes within a certain environment, for these highly fit phenotypes to form a percolating cluster, it would mean we would have to apply the reasoning of the model a second time. It would mean that all viable phenotypes must be connected to so many other viable phenotypes that among them would be another highly fit phenotype. Here, we take “highly fit” to be those viable phenotypes that have relative fitness greater than 1-\epsilon, where the fittest phenotype has relative fitness 1. This further dramatizes the inability of evolution to strike on “highly fit” phenotypes through a single mutation in realistic population size and time, since we must consider not p_\omega, but p_v\times p_\omega, where p_v is the probability of being viable and p_\omega is the probability of being highly fit. Both of these probabilities are almost certainly astronomically small, making the burden on the impoverishingly small number of 6\times 10^{44} even heavier.

It’s my belief, then, that in realistic evolution with finite population and time, fitness valleys nevertheless have to be crossed. Eithere there are no highly fit phenotypes a single mutation away, or if such mutations exist, then the space of all possible mutations is so large as to be impossible to fully sample with finite population and time. The old problem of having to cross fitness valleys is not entirely circumvented by the holey adaptive landscapes approach.

Next Thursday, I will seek to hijack this model for my own uses, as a model of macroevolution.

Holey adaptive landscapes

Hello world :-)

My research interests has veered off pure EGT, but my questions still center around evolution — particularly the evolution of complex systems that are made up of many small components working in unison.  In particular, I’ve been studying Gavrilets et al. ‘s model of holey fitness landscapes, I think it’s a model with great potential for studying macroevolution, or evolution on very long timescales. I’m not the first one to this idea, of course — Arnold and many others have seen the possible connection also, although I think of it in a rather different light.

In this first post, I will give a short summary of this model, cobbled together from several papers and Gavrilets’ book, the Origin of Species. The basic premise is that organisms can be characterized by a large number of traits. When we say large, we mean very large — thousands or so. Gavrilets envisions this as being the number of genes in an organism, so tens of thousands. The important thing is that each of these traits can change independently of other ones.

The idea that organisms are points in very high dimensional space is not new, Fisher had it in his 1930 classic Genetical Theory of Natural Selection, where he used this insight to argue for micromutationism — in such high dimensional space, most mutations of appreciable size are detrimental, so Fisher argued that most mutations must be small (this result was later corrected by Kimura, Orr and others, who argued that most mutations must be of intermediate size, since tiny mutations are unlikely to fix in large populations).

However, even Fisher didn’t see another consequence of high-dimensional space, which Gavrilets exploited mercilessly. The consequence is that in high-enough dimensional space, there is no need to cross fitness valleys to move between one high fitness phenotype to another; all high fitness genotypes are connected. This is because connectivity is exceedingly easy in high dimensional space. Consider two dimensions, to get from one point to another, there are only two directions to move in. Every extra dimension offers a new option for such movement, that’s why there’s a minimum dimensionality to chaotic behavior — we can’t embed a strange attractor in a two dimensional phase plane, since trajectories can’t help but cross each other. Three dimensions is better, but n-dimensional space, where n is in the tens of thousands — that’s really powerful stuff.

Basically, every phenotype — every point in n-D space, is connected to a huge number of other points in n-D space. That is, every phenotype has a huge number of neighbors. Even if the probability of being a highly fit organism is exceedingly small, chances are high that one would exist among this huge number of neighbors. We know that if each highly fit phenotype is, on average, connected to another highly fit phenotype (via mutation), then the percolation threshold is reached where almost all highly fit phenotypes are connected in one giant connected component. In this way, evolution does not have to traverse fitness minima.

If we consider mutations to be point mutations of genes, then mutations can be considered to be a Manhattan distance type walk in n-D space. That’s just a fancy way of saying that we have n genes, and only one can be changed at a time. In that case, the number of neighbors any phenotype has is n, and if the probability of being highly fit is better than 1/n, then highly fit organisms are connected. This is even easier if we consider mutations to be random movements in n-D space. That is, if we consider an organism to be characterized by \mathbf{p}=(p_1, p_2, ... p_n), where p_i is the i^{th} trait, and a mutation from \mathbf{p} results in \mathbf{p_m}=(p_1+\epsilon_1, ... p_n+\epsilon_n), such that \epsilon_i is a random small number that can be negative, and the Euclidean distance between \mathbf{p_m} and \mathbf{p} is less than \delta, where \delta is the maximum mutation size, then the neighbors of \mathbf{p} fill up the volume of a ball of radius \delta around \mathbf{p}. The volume of this ball grows exponentially with n, so even a tiny probability of being highly fit will find some neighbor of \mathbf{p} that is highly fit, because of the extremely large volume even for reasonably sized n.

The fact that evolution may never have to cross fitness minima is extremely important, it means that most of evolution may take place on “neutral bands”. Hartl and Taube had foreseen this really interesting result. Gavrilets mainly used this result to argue for speciation, which he envisions as a process that takes place naturally with reproductive isolation and has no need for natural selection.

Several improvements over the basic result have been achieved, mostly in the realm of showing that even if highly fit phenotypes are highly correlated (forming “highly fit islands” in phenotype space), the basic result of connectivity nevertheless holds (i.e. there will be bridges between those islands). Gavrilets’ book  summarizes some early results, but a more recent paper (Gravner et al.) is a real tour-de-force in this direction. Their last result shows that the existence of “incompatibility sets”, that is, sets of traits that destroy viability, nevertheless does not punch enough holes in n-D space to disconnect it. Overall, the paper shows that even with correlation, percolation (connectedness of almost all highly fit phenotypes) is still the norm.

Next Thursday, I will detail some of my own criticisms to this model and its interpretation. The week after next, I will hijack this model for my own purposes and I will attempt to show that such a model can display a great deal of historical contingency, leading to irreversible, Muller’s Ratchet type evolution that carries on in particular directions even against fitness considerations. This type of model, I believe, will provide an interesting bridge between micro and macroevolution.