:-) My cats say hello.

In my last post I wrote about holey adaptive landscapes, a model of evolution in very high dimensional space where there is no need to jump across fitness valleys. The idea is that if we consider the phenotype of organisms to be points in  n-dimensional space, where n is some large number (say, tens of thousands, as in the number of our genes), then high fitness phenotypes easily percolate even if they are rare. By percolate, I mean that most high-fitness phenotypes are connected in one giant component, so evolution from one high fitness “peak” to another does not involve crossing a valley, rather, there are only high fitness ridges that are well-connected. This is why the model is consider “holey”, highly fit phenotypes are entirely connected within the fitness landscape, but they run around large “holes” of poorly fit phenotypes that seem to be carved out from the landscape.

This is possible because as n (the number of dimensions) increases, the number of possible mutants that any phenotype can become also increases. The actual rate of increase depends on the model of mutation and can be linear, if we consider n to be the number of genes, or exponential, if we consider n to be the number of independent but continuous traits that characterize an organism. Once the number of possible mutants become so large that all highly fit phenotypes have, on average, another highly fit phenotype as neighbor, then percolation is assured.

More formally, we can consider the basic model where highly fit phenotypes are randomly distributed over phenotype space: any phenotype has probability $p_\omega$ of being highly fit. Let $S_m$ be the average size of the set of all mutants of any phenotype. For example, if n is the number of genes and the only mutations we consider are the loss-of-function of single genes, then $S_m$ is simply n, since this is the number of genes that can be lost, and therefore is the number of possible mutants. Percolation is reached if $S_m>\dfrac{1}{p_\omega}$. Later extensions also consider cases where highly fit phenotypes exist in clusters and showed that percolation is still easily achievable (Gavrilets’ book Origin of Species, Gravner et al.)

I have several criticisms of the basic model. As an aside, I find criticism to be the best way we can honor any line of work, it means we see a potential worthy of a great deal of thought and improvement. I’ll list my criticisms in the following:

1) We have not a clue what $p_\omega$ is, not the crudest ball-park idea. To grapple with this question, we must understand  what makes an admissible phenotype. For example, we certainly should not consider any combination of atoms to be a phenotype. The proper way to define an admissible phenotypes is by defining the possible operations (mutations) that move us from one phenotype to another, that is, we must define what is a mutation. If only DNA mutations are admissible operations, and if the identical DNA string produces the same phenotype in all environments (both risible assumptions, but let’s start here), then the space of all admissible phenotypes are all possible strings of DNA. Let us consider only genomes of a billion letters in length. This space is, of course, $4^{10^9}$. What fraction of these combinations are highly fit? The answers must be a truly ridiculously small number. So small that if $S_m\approx O(n)$, I would imagine that there is no way that highly fit phenotypes reach percolation.

Now, if $S_m\approx O(a^n)$, that is a wholly different matter altogether. For example, Gravner et al. argued that $a\approx 2$ for continuous traits in a simple model. If n is in the tens of thousands, my intuition tells me it’s possible that higly fit phenotypes reach percolation, since exponentials make really-really-really big numbers really quickly. Despite well known evidence that humans really are terrible intuiters at giant and tiny numbers, the absence of fitness valleys becomes at least plausible. But… it might not matter, because:

2) Populations have finite size, and evolution moves in finite time. Thus, the number of possible mutants that any phenotype will in fact explore is linear in population size and time (even if those that it can potentially explore is much larger). Even if the number of mutants, $S_m$ grows exponentially with n, it doesn’t matter if we never have enough population or time to explore that giant number of mutants. Thus, it doesn’t matter that highly fit phenotypes form a percolating cluster, if the ridges that connect peaks aren’t thick enough to be discovered. Not only must there be highly-fit neighbors, but in order for evolution to never have to cross fitness valleys, highly-fit neighbors must be common enough to be discovered. Else, if everything populations realistically discover are low fitness, then evolution has to cross fitness valleys anyway.

How much time and population is realistic? Let’s consider bacteria, which number in the $5\times 10^{30}$. In terms of generation time, let’s say they divide once every twenty minutes, the standard optimal laboratory doubling time for E. Coli. Most bacteria in natural conditions have much slower generation time. Then if bacteria evolved 4.5 billion years ago, we have had approximately 118260000000000, or ~$1.2\times 10^{14}$ generations. The total number of bacteria sampled across all evolution is therefore on the order of $6\times 10^{44}$. Does that sound like a large number? Because it’s not. That’s the trouble with linear growth. Against $4^{10^9}$, this is nothing. Even against $2^{10000}$ (where we consider $10000$ to be n, the dimension number), $6\times 10^{44}$ is nothing. That is, we simply don’t have time to test all the mutants. Highly fit phenotypes better make up more than $\dfrac{1}{6\times 10^{44}}$ of the phenotype space, else we’ll never discover it. Is $\dfrac{1}{6\times 10^{44}}$ small? Yes. Is it small enough? I’m not sure. Possibly not. In any case, this is the proper number to consider, not, say, $2^{10000}$. The fact that $S_m\approx O(a^n)$ is so large is a moot point.

3) My last criticism I consider the most difficult one for the model to answer. The holey adaptive landscapes model does not take into account environmental variation. To a great extent, it confuses the viable with the highly fit. In his book, Gavrilets often use the term “viable”, but if we use the usual definition of viable — that is, capable of reproduction, then clearly most viable phenotypes are not highly fit. Different viable phenotypes might be highly fit under different environmental conditions, but fitness itself has little meaning outside of a particular environment.

A straightforward inclusion of environmental conditions into this model is not easy. Let us consider the basic model to apply to viable phenotypes, that is, strings of DNA that are capable of reproduction, under some environment. Let us say that all that Gavrilets et al. has to say are correct with respect to viable phenotypes, that they form a percolating cluster, etc. Now, in a particular environment, these viable phenotypes will have different fitness. If we further consider only the highly fit phenotypes within a certain environment, for these highly fit phenotypes to form a percolating cluster, it would mean we would have to apply the reasoning of the model a second time. It would mean that all viable phenotypes must be connected to so many other viable phenotypes that among them would be another highly fit phenotype. Here, we take “highly fit” to be those viable phenotypes that have relative fitness greater than $1-\epsilon$, where the fittest phenotype has relative fitness $1$. This further dramatizes the inability of evolution to strike on “highly fit” phenotypes through a single mutation in realistic population size and time, since we must consider not $p_\omega$, but $p_v\times p_\omega$, where $p_v$ is the probability of being viable and $p_\omega$ is the probability of being highly fit. Both of these probabilities are almost certainly astronomically small, making the burden on the impoverishingly small number of $6\times 10^{44}$ even heavier.

It’s my belief, then, that in realistic evolution with finite population and time, fitness valleys nevertheless have to be crossed. Eithere there are no highly fit phenotypes a single mutation away, or if such mutations exist, then the space of all possible mutations is so large as to be impossible to fully sample with finite population and time. The old problem of having to cross fitness valleys is not entirely circumvented by the holey adaptive landscapes approach.

Next Thursday, I will seek to hijack this model for my own uses, as a model of macroevolution.

2 Responses to Criticisms of holey adaptive landscapes

1. I always become very concerned when I see specific numbers and generic growing variables (like n) mixed so much. It makes it really hard to me to understand what you are trying to say. I think if you are thinking of things like percolation, etc. you have already committed to yourself to a model where you will analyze something that depends on n and then n will go to infinity. I think this is a great approach: even though you might never actually handle arbitrarily large objects it doesn’t make sense why your theory shouldn’t. That is one of the beauties of TCS.

I would really like to see this model stated (or criticized) in terms that are convincing even if I adopt big-O notation. I,e, where 3, 74, and 2^345921 are all the same to me, since they are simply constants, and smaller than n/684902 since that will EVENTUALLY become bigger than any of them. You might object that “hey we look at biology and we see specific large genomes” to which I will respond: “do you want a general theory of evolution, or a specific theory that matches your current observations?”

This site uses Akismet to reduce spam. Learn how your comment data is processed.