Criticisms of holey adaptive landscapes

:-) My cats say hello.

In my last post I wrote about holey adaptive landscapes, a model of evolution in very high dimensional space where there is no need to jump across fitness valleys. The idea is that if we consider the phenotype of organisms to be points in  n-dimensional space, where n is some large number (say, tens of thousands, as in the number of our genes), then high fitness phenotypes easily percolate even if they are rare. By percolate, I mean that most high-fitness phenotypes are connected in one giant component, so evolution from one high fitness “peak” to another does not involve crossing a valley, rather, there are only high fitness ridges that are well-connected. This is why the model is consider “holey”, highly fit phenotypes are entirely connected within the fitness landscape, but they run around large “holes” of poorly fit phenotypes that seem to be carved out from the landscape.

This is possible because as n (the number of dimensions) increases, the number of possible mutants that any phenotype can become also increases. The actual rate of increase depends on the model of mutation and can be linear, if we consider n to be the number of genes, or exponential, if we consider n to be the number of independent but continuous traits that characterize an organism. Once the number of possible mutants become so large that all highly fit phenotypes have, on average, another highly fit phenotype as neighbor, then percolation is assured.

More formally, we can consider the basic model where highly fit phenotypes are randomly distributed over phenotype space: any phenotype has probability p_\omega of being highly fit. Let S_m be the average size of the set of all mutants of any phenotype. For example, if n is the number of genes and the only mutations we consider are the loss-of-function of single genes, then S_m is simply n, since this is the number of genes that can be lost, and therefore is the number of possible mutants. Percolation is reached if S_m>\dfrac{1}{p_\omega}. Later extensions also consider cases where highly fit phenotypes exist in clusters and showed that percolation is still easily achievable (Gavrilets’ book Origin of Species, Gravner et al.)

I have several criticisms of the basic model. As an aside, I find criticism to be the best way we can honor any line of work, it means we see a potential worthy of a great deal of thought and improvement. I’ll list my criticisms in the following:

1) We have not a clue what p_\omega is, not the crudest ball-park idea. To grapple with this question, we must understand  what makes an admissible phenotype. For example, we certainly should not consider any combination of atoms to be a phenotype. The proper way to define an admissible phenotypes is by defining the possible operations (mutations) that move us from one phenotype to another, that is, we must define what is a mutation. If only DNA mutations are admissible operations, and if the identical DNA string produces the same phenotype in all environments (both risible assumptions, but let’s start here), then the space of all admissible phenotypes are all possible strings of DNA. Let us consider only genomes of a billion letters in length. This space is, of course, 4^{10^9}. What fraction of these combinations are highly fit? The answers must be a truly ridiculously small number. So small that if S_m\approx O(n), I would imagine that there is no way that highly fit phenotypes reach percolation.

Now, if S_m\approx O(a^n), that is a wholly different matter altogether. For example, Gravner et al. argued that a\approx 2 for continuous traits in a simple model. If n is in the tens of thousands, my intuition tells me it’s possible that higly fit phenotypes reach percolation, since exponentials make really-really-really big numbers really quickly. Despite well known evidence that humans really are terrible intuiters at giant and tiny numbers, the absence of fitness valleys becomes at least plausible. But… it might not matter, because:

2) Populations have finite size, and evolution moves in finite time. Thus, the number of possible mutants that any phenotype will in fact explore is linear in population size and time (even if those that it can potentially explore is much larger). Even if the number of mutants, S_m grows exponentially with n, it doesn’t matter if we never have enough population or time to explore that giant number of mutants. Thus, it doesn’t matter that highly fit phenotypes form a percolating cluster, if the ridges that connect peaks aren’t thick enough to be discovered. Not only must there be highly-fit neighbors, but in order for evolution to never have to cross fitness valleys, highly-fit neighbors must be common enough to be discovered. Else, if everything populations realistically discover are low fitness, then evolution has to cross fitness valleys anyway.

How much time and population is realistic? Let’s consider bacteria, which number in the 5\times 10^{30}. In terms of generation time, let’s say they divide once every twenty minutes, the standard optimal laboratory doubling time for E. Coli. Most bacteria in natural conditions have much slower generation time. Then if bacteria evolved 4.5 billion years ago, we have had approximately 118260000000000, or ~1.2\times 10^{14} generations. The total number of bacteria sampled across all evolution is therefore on the order of 6\times 10^{44}. Does that sound like a large number? Because it’s not. That’s the trouble with linear growth. Against 4^{10^9}, this is nothing. Even against 2^{10000} (where we consider 10000 to be n, the dimension number), 6\times 10^{44} is nothing. That is, we simply don’t have time to test all the mutants. Highly fit phenotypes better make up more than \dfrac{1}{6\times 10^{44}} of the phenotype space, else we’ll never discover it. Is \dfrac{1}{6\times 10^{44}} small? Yes. Is it small enough? I’m not sure. Possibly not. In any case, this is the proper number to consider, not, say, 2^{10000}. The fact that S_m\approx O(a^n) is so large is a moot point.

3) My last criticism I consider the most difficult one for the model to answer. The holey adaptive landscapes model does not take into account environmental variation. To a great extent, it confuses the viable with the highly fit. In his book, Gavrilets often use the term “viable”, but if we use the usual definition of viable — that is, capable of reproduction, then clearly most viable phenotypes are not highly fit. Different viable phenotypes might be highly fit under different environmental conditions, but fitness itself has little meaning outside of a particular environment.

A straightforward inclusion of environmental conditions into this model is not easy. Let us consider the basic model to apply to viable phenotypes, that is, strings of DNA that are capable of reproduction, under some environment. Let us say that all that Gavrilets et al. has to say are correct with respect to viable phenotypes, that they form a percolating cluster, etc. Now, in a particular environment, these viable phenotypes will have different fitness. If we further consider only the highly fit phenotypes within a certain environment, for these highly fit phenotypes to form a percolating cluster, it would mean we would have to apply the reasoning of the model a second time. It would mean that all viable phenotypes must be connected to so many other viable phenotypes that among them would be another highly fit phenotype. Here, we take “highly fit” to be those viable phenotypes that have relative fitness greater than 1-\epsilon, where the fittest phenotype has relative fitness 1. This further dramatizes the inability of evolution to strike on “highly fit” phenotypes through a single mutation in realistic population size and time, since we must consider not p_\omega, but p_v\times p_\omega, where p_v is the probability of being viable and p_\omega is the probability of being highly fit. Both of these probabilities are almost certainly astronomically small, making the burden on the impoverishingly small number of 6\times 10^{44} even heavier.

It’s my belief, then, that in realistic evolution with finite population and time, fitness valleys nevertheless have to be crossed. Eithere there are no highly fit phenotypes a single mutation away, or if such mutations exist, then the space of all possible mutations is so large as to be impossible to fully sample with finite population and time. The old problem of having to cross fitness valleys is not entirely circumvented by the holey adaptive landscapes approach.

Next Thursday, I will seek to hijack this model for my own uses, as a model of macroevolution.