Fitness landscapes as mental and mathematical models of evolution
August 16, 2013 21 Comments
I’ve discussed this model in passing when talking about frequency-independent selection. The central idea is that we have a genetic or phenotypic space where each node is a possible genotype (or phenotype, depending on type of model) and an edge exists between two nodes if a single mutation transforms one node into the other. The easiest model to think about in this case (and which is sufficient for the rigorous qualitative features that we will care about) is the biallelic system. In this case we have loci, at each of which it is possible to have one of two alleles, thus our space is the -bit binary strings . A mutation can flip any loci from one allele to the other, thus two strings are adjacent if they differ in exactly one bit or — said more formally — are at Hamming distance one. This makes the landscape an -dimensional hypercube with the strings corresponding to genotypes as vertexes. The last ingredient – fitness — is given by a function that maps each string to a real number.
Individual organisms can be thought of as inhabiting the vertexes of the landscape corresponding to their genotype. The probability of each organism reproducing or its reproductive rate is given by the fitness value of its vertex. Note that this fitness is independent of the distribution of other agents, and hence this model is only valid for frequency-independent selection. This is an overreach of reductionism in ignoring the distribution of the whole population, but one that can sometimes be theoretically justified and at other times is conceptually or empirically convenient. In practice, this makes the model only valid on relatively short timescales since organisms tend to change their environment and thus selective pressures over extremely long timescales. The model was also not designed to incorporate frequency-dependent strategic interactions between organisms. As such, it is not useful in evolutionary game theory, although I will discuss a way to modify the model to be partially useful in a future post.
Wright’s original formulation considered a population of sexually reproducing organisms distributed over the landscape. Modern treatments, however, have usually concentrated on asexual populations. In an asexual populations with sufficiently low mutations, the probability of a double (or more) mutation during reproduction is so low that it can be ignored and only point mutations matter. In this setting, an organism’s offspring will be in the same or an adjacent vertex as its parent. Gillespie (1983; 1984) took this to the extreme by showing that in a population a mutation rate and effective population size , if then a beneficial mutation goes to fixation in the population before the next one can occur. This assumption is known as strong-selection weak-mutation (SSWM) and allows us to coarse-grain our scales to model the monomorphic population as a single point on the landscape, and evolutionary dynamics as steps from a vertex of lower fitness to an adjacent vertex of higher fitness. This hill climbing view of evolution has been very useful for proving mathematical results, but has also unfortunately led to a view of evolution as an optimization procedure and corresponding teleological conclusions.
Notice that in the SSWM model, the exact quantitative fitness differences between adjacent genotypes don’t matter; what matters is the qualitative feature of the difference being positive or negative. As such, it is tempting to replace the fitness function by a flow on the graph: instead of undirected edges between vertexes, we will direct the edges from the lower to the higher fitness genotype. Assuming that no two adjacent genotypes have exactly the same fitness then as long as the resulting directed graph is acyclic then it is possible to define some fitness function. This can be viewed as a biological analog to a von Neumann-Morgenstern rationality assumption on the landscape (with fitness interpreted as utility). Crona, Greene, & Barlow (2013) introduced this representation into theoretical biology as fitness graphs, but they have been used previously in empirical studies of fitness landscapes (De Visser et al. 2009; Franke et al., 2011; Goulart et al., 2013; Szendro et al., 2013). This approach is particularly useful empirically because it is difficult to quantitatively compare fitnesses across experiments. In theoretic work, the fitness graph approach has made the proofs of some classical theorems relating local structure to global properties easier.
A population will move along the flow on the fitness landscape from lower to higher fitness vertexes, reaching equilibrium only if it ever reaches a local or global peak (a sink in the flow model). The fact that (in finite fitness landscapes) it eventually has to reach a peak has been taken by biologists as justification for assuming that the population is already at a peak. In fact, from the earliest days to today, investigations in biology had the form (Wright, 1932):
In a rugged field of this character selection will easily carry the species to the nearest peak, but there may be innumerable other peaks which are higher but which are separated by “valleys.” The problem of evolution as I see it is that of a mechanism by which the species may continually find its way from lower to higher peaks in such a field.
Note that in the above passage, Wright implicitly assumes that a population starting away from equilibrium will get to a peak in a reasonable amount of time. However, this is an assumption computer scientists can question by asking what features of a fitness landscape allow evolution to find a peak faster than exhaustive search; this is the question both Chaitin (2009) and Valiant (2009) ask.
The simplest landscape to think about is the Mt. Fuji landscape. In this case, each locus contributes independently to fitness, there is no between locus interactions – called epistasis by biologists – in the genome. Mathematically, this means that we can write the fitness function as where each is a function of a single bit with – finding the optimal genotype is a matter of setting each bit to the position (0 or 1) that maximizes . In the landscape metaphor this is a smooth increasing hill towards the one maximum given by each locus set to the higher fitness allele; every shortest path is an adaptive path.
In this landscape, evolution is exponentially faster than exhaustive search. This speed up is easy to see, because in this case evolution is analogous to finding a marked element in a sorted list using binary search. If the organism is currently at a suboptimal genotype then a candidate mutation samples a random bit and if then an adaptive step is possible and the population “moves up” the landscape (else it samples another bit) and in the process it eliminates half of the possible remaining genotypes (i.e. all with ) from further consideration — just like binary search. It is clear that if the optimal genotype is then evolution will reach it in steps where is the number of bits on which and differ; i.e. their Hamming distance . Note that this is also the least number of steps possible for an asexual population, and even a designer can’t do better than steps if constrained to point-mutations. Of course, evolution will try and reject some non-adaptive mutations and the expected number of mutations evolution considers will set the time and be upper bounded by:
This is polynomial in — the size of the genome — and exponentially faster than exhaustive search, which would need to look at all (or most) of the possible genotypes. Notice the similarity to Chaitin’s results, this is because his ‘general abstract’ model only considers a Mt. Fuji landscape, and biologists have known since at least Kauffman & Levin (1987) that evolution on such simple landscapes is efficient. Unsurprisingly, this model is considered too simple to capture the nuances of real biological systems.
Of course, it is also not hard to see that there are landscapes where evolution — or actually any algorithm (except quantum ones, where Grover search allows a quadratic speed up, but that is still just a crawling slow ) — will be as slow as exhaustive search. Simply select one genotype to have fitness 1 and all other strings to have fitness 0. Now, starting at a random point in this landscape and trying to find the fitness peak is equivalent to searching an unordered list of exponential size. Thus, the fitness landscape model has to be more carefully specified on what we consider to be biologically plausible landscapes.
This is the first of a series of posts on computational considerations of empirical and theoretical fitness landscapes.
Chaitin, G. (2009). Evolution of Mutating Software. EATCS Bulletin, 97: 157-164
Crona, K., Greene, D., & Barlow, M. (2013). The peaks and geometry of fitness landscapes. Journal of Theoretical Biology, 317: 1-10.
de Visser, J.A.G.M., Park, S.C., & Krug, J. (2009). Exploring the effect of sex on empirical fitness landscapes. The American Naturalist.
Franke, J., Klozer, A., de Visser, J.A.G.M., & Krug, J. (2011). Evolutionary accessibility of mutation pathways. PLoS Comp. Biol. 7(8): e1002134.
Gillespie, J.H. (1983). A simple stochastic gene substitution model. Theor. Pop. Biol. 23, 202.
Gillespie, J.H. (1984). Molecular evolution over the mutational landscape. Evolution 38, 1116.
Goulart, C.P., Mentar, M., Crona, K., Jacobs, S.J., Kallmann, M., Hall, B.G., Greene, D., Barlow, M. (2013). Designing antibiotic cycling strategies by determining and understanding local adaptive landscapes. PLoS ONE 8(2): e56040.
Kauffman, S., & Levin, S. (1987). Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology, 128(1): 11-45.
Szendro, I.G., Schenk, M.F., Franke, J., Krug, J., de Visser, J.A.G.M. (2013). Quantitative analyses of empirical fitness landscapes. J. Stat. Mech. P01005.
Valiant, L.G. (2009). Evolvability. Journal of the ACM 56(1): 3.
Wright, S. (1932). The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proceedings of the Sixth International Congress of Genetics, 356-366