Epistasis and empirical fitness landscapes

Biologists tend to focus on nuances — to the point that Rutherford considered the field as stamp-collecting — and very local properties of systems, leading at times to rather reductionist views. These approaches are useful for connecting to experiment, but can be shown to underspecify conceptual models that need a more holistic approach. In the case of fitness landscapes, the metric that biologists study is epistasis — the amount of between locus interactions — and is usually considered for the interaction of just two loci at a time; although Beerenwinkel et al. (2007a,b) have recently introduced a geometric theory of gene interaction for considering epistasis across any number of loci. In contrast, more holistic measures can be as simple as the number of peaks in the landscape, or the computational or as complicated as the global combinatorial features of interest to theoretical computer scientists. In this post I discuss connections between the two and provide a brief overview of the empirical work on fitness landscapes.

Epistasis in fitness graphs of two loci. Arrows point from lower fitness to higher fitness, and AB always has higher fitness than ab. From left to right no epistasis, sign epistasis, reverse sign epistasis.

For two loci, there are 3 types of epistasis: magnitude, sign, and reciprocal sign. To explain these types, we will consider two loci with the first having alleles a and A, and the second b and B. Assume that the upper-case combination is more fit $f(ab) < f(AB)$. If there is no epistasis then the fitness effects are additive and independent of background: $f(AB) - f(aB) = f(Ab) - f(ab)$, $f(AB) - f(Ab) = f(aB) - f(ab)$. In magnitude epistasis this additivity is broken, but the signs remain: $f(AB) > f(aB) > f(ab)$ and $f(AB) > f(Ab) > f(ab)$. Note that for fitness graphs (Crona et al., 2013) where we supress the quantitative fitness function by featuring only the qualitative feature of an arrow from x to y if the fitness of x is less then that fitness of y and the two vertexes are adjacent. This model does not distinguish between no epistasis and magnitude epistasis, and since it captures the qualitative behavior of strong-selection weak-mutation (SSWM) adaptive walks, it is not surprising that magnitude epistasis does not matter for the qualitative behavior of adaptation.

Sign and reciprocal sign epistasis, however, show up as distinct in fitness graphs, as is shown in the opening figure. A system has sign epistasis if it violates one of the two conditions for magnitude epistasis. For example, if $f(AB) > f(aB) > f(ab) > f(Ab)$ then there is negative sign epistasis at the first locus. If the second locus is b then the mutation from a to A is not adaptive, but if the second locus is B then the mutation from a to A is adaptive. We can also consider positive sign epistasis on the first locus $f(Ab) > f(AB) > f(aB) > f(ab)$. It is easy to see that if there is no sign epistasis then every shortest part from $x$ to an optimum $x^*$ is an adaptive path and thus we have a smooth or Mt. Fuji landscape (Weinreich et al., 2005; Crona et al., 2013). Thus (and unsurprisingly) smoothness is a local property and amendable to a fully reductionist treatment.

Note that the shortest paths from abc to ABc are blocked by the reciprocal sign epistasis of the red edges. However, an alternative adaptive path exists along the green edges that first introduces the C allele to reach ABC, but then removes it to return to ABc.

Ruggedness or more specifically having multiple peaks, however, is a more holistic property that is only weakly related to epistasis. A system has reciprocal sign epistasis if both conditions of magnitude epistasis are broken, or (equivalently) if we have sign epistasis on both loci (Poelwijk et al. 2007). An example of negative reciprocal epistasis would be if $f(AB) > f(ab)$ but $f(ab) > f(Ab)$ and $f(ab) > f(aB)$. The presence of reciprocal sign epistasis is a necessary condition for multiple peaks, if there are multiple peaks then there must be at least one pair of loci with reciprocal sign epistasis (Poelwijk et al. 2007, 2011). However, reciprocal sign epistasis is not sufficient, since evolution can use a third locus to go around the fitness valley as shown in the figure at right. In fact, there is no local property in terms of just reciprocal sign epistasis that is sufficient for the existence of multiple-peaks (Crona et al., 2013). A sufficient condition can be given in terms of reciprocal and single sign epistasis, if there is reciprocal sign epistasis but no pair of loci with just a single sign epistasis (i.e. sign epistasis on only one of the two loci, as given in the example of the previous paragraph) then multiple peaks must exist (Crona et al., 2013). Unfortunately, this condition is not necessary because we can have multiple peaks in systems where single sign epistasis exists. I expect that there is absolutely no local property (I would also conjecture that there is no polynomial time testable property) that is both necessary and sufficient for multiple-peaks. In other words, if we want to understand rugged fitness landscapes then we cannot adopt an overly reductionist view.

Two examples of empirical fitness landscape from Figure 1 of Szendro et al. (2013). The left fitness landscape is based on the data of Chou et al. (2011) and contains a single optimum (1111) and is a smooth landscape with no sign epistasis. The right landscape is based on data from Lozovsky et al. (2009) and has both single sign and reciprocal sign epistasis and two peaks. The first peak is 1100 and its basin of attraction is shown in read, the second is 0101 with a blue basin of attraction. Can you find which loci pair has the reverse sign epistasis?

Unfortunately, reductionist views are ingrained in experimental biology, and this makes empirical tests of ruggedness extremely difficult (Whitlock et al., 1995; Kryazhimskiy et al., 2009). In particular, most experimental results don’t actually attempt to measure the fitness landscape, but instead just report the average fitness versus time and average number of acquired adaptations versus time (Lenski & Travisano., 1994; Cooper & Lenski, 2000; Barrick, et al., 2009; Kryazhimskiy et al., 2009). Szendro et al. (2013) surveyed the few recent experiments that conducted a methodical examination of all mutations in a subset of loci of model organisms, but most studies (6 out of 12) were able to empirically realize only small fitness landscapes of just 4 to 5 loci (so 16 to 32 vertexes), with the largest full fitness landscape having length 6 with all 64 vertexes examined (Hall et al., 2010), and the largest number of vertexes in a single study being 418 out of the possible 512 in a length 9 landscape (O’Maille et al., 2008). Szendro et al. (2013) compared these landscapes to popular theoretical models along many measures. To get decent statistics and because it is unknown how many of the measures scale with landscape size and the smallest examples had size 4, the authors restrained themselves to looking at for loci sub-samples of the empirical data. From the models they considered, the rough Mt. Fuji model (standard Mt. Fuji with Gaussian noise added to individual vertexes) fit their data best. Unfortunately, these fitness landscapes are unreasonably small to be useful for distinguishing qualitative dynamics of interest to theoretical computer science. In particular, with $n = 4$ we can’t even tell the difference between quadratic time (usually associated with random walks) and exponential time (as associated with exhaustive search), so in terms of search time with such small landscapes we can’t even distinguish the two extremes or ordered and unordered search discussed in the last entry. A four loci landscape is simply too local of a property, and not much more informative than the reductionist two loci analysis of epistasis. However, the biological intuition is that real landscapes are a little rough, and have multiple optima but not as frequent as completely uncorrelated models.

As we saw in the previous post, mathematical biologist like to assume that organisms are at equilibrium or when perturbed from equilibrium by a change in environment, return to a new one relatively quickly. Is this a reasonable assumption? From a genome-wide perspective, it seems to be at odds with the intuition of naturalists. Consider for example vestigial features of your own body like your appendix, goose bumps, tonsils, wisdom teeth, third eyelid, or the second joint in the middle of your foot made immobile by a tightened ligament (see video below). Wouldn’t it be more efficient (and thus produce marginally higher fitness) if you didn’t spend the energy to construct these features? Of course, this naturalist argument is not convincing since we don’t know if there are any small mutations that could remove these vestigial features from our development, I could just be describing a different local optimum that lays on the other side of a fitness valley from my current vertex. This example is further complicated because the concept of equilibrium is different for sexual organisms that are capable of recombination (or even for large populations with mechanisms like plasmids for horizontal gene transfer) and often does not correspond to something as simple as a peak in the fitness landscape (Livnat et al., 2008).

The other tempting naturalist example of macroevolutionary changes like speciation is also not convincing for static fitness landscapes. The usual retort is that on these timescales the environment is not constant and depends on the organisms through mechanisms like niche-construction or frequency-dependence. This defense of local equilibria is actually a central part of the punctuated equilibrium theory of evolution; the environment changes (either through an external effect like meteor or internal effect like migration or niche-construction) and the wild-type becomes not locally optimal, but adaptation quickly carries the species to a new nearby local optimum where it remains for a long period of time until the next environmental change. Naturalistic observations are insufficient to settle this question, so we need to turn to experiments.

One of the earliest success stories for mathematical models of rugged landscapes was an application to affinity maturation (Kauffman & Weinberger, 1989). The length of evolutionary process leading to affinity maturation is very short, typically a local equilibrium is found after only 6-8 nucleotide changes in CDR (Crews et al., 1981; Tonegawa, 1983; Clark et al., 1985), so you need only a few point mutations to quickly develop a drastically better tuned antibody — an adaptive process that happens on the order of days. The agreement of Kauffman & Weinberger’s (1989) rugged fitness-landscape model with this empirical data provided support for the usefulness of studying fitness landscape and continued to propagate the view that local equilibria can be quickly found by adaptation, but I think there are two reservations to keep in mind. First, the adapted B-cells were not experimentally isolated and all of their point-mutations were not checked to guarantee that a fitness peak was reached. In both theoretical and experimental treatments of evolution, it is known that fitness increases tend to show a pattern of geometrically diminishing returns (Lenski & Travisano 1994; Bull et al., 1997; Orr, 1998; Cooper & Lenski, 2000; Kryazhimskiy et al., 2009) which means that after a few generations the fitness change will be so small that the fixation time in the large population of B-cells will be longer than the presence of the pathogen causing the immune response. We might not be seeing more steps because the next steps might have a fitness increase too small to fixate before the environment (and thus, fitness landscape) changes again. Second, as can be seen from the AID protein (and other mechanisms) increasing the rate of mutation by a factor of $10^6$ along the gene encoding antibody proteins, this is a fitness landscape that has been shaped by previous evolution of the human immune system to find fit mutants as quickly as possible. This biases the phenomenon towards landscapes where local maxima would be easier to find than usual, and thus makes it not a good candidate for considering evolution under more typical conditions.

A more typical setting might be the evolution of E. coli in a static fitness landscape. Here, biologists have run long-term experiments tracking a population for over 50,000 generations (Lenski & Travisano 1994; Cooper & Lenski, 2000; Blount et al. 2012) and continue to still find adaptations and marginal increases in fitness. This suggests that a local optimum is not quickly found, even though the environment is static. However, it is difficult to estimate the number of adaptive mutations that fixed in this population, and Lenski (2003) estimated that as few as 100 adaptive point-mutations fixated in the first 20,000 generations. It is also hard to argue that the population doesn’t traverse small fitness valleys between measurements, which could be used to suggest that the colony is hopping from one easy-to-find local equilibrium to the next.

To answer my own question: we know almost nothing about the structure of fitness landscapes. However, theoretical biologists continue to forge ahead, making arbitrary assumptions about the structure and statistical properties of fitness landscapes. There is a disconnect between theory and data (Orr, 2005; Kryazhimsky 2009) that reminds me of a similar disconnect in economics. In economics, this has lead to a distrust and decline of neoclassical theory. I fear a similar decline in evolutionary biology if theorists continue to make ‘reasonable’ assumptions unconnected to data. Theoretical computer science can help us here, by studying fitness landscape philosophically as the mental metaphors that they are and making worst-case or minimally-constrained rigorous qualitative statements about them. This has worked to save economics, where algorithmic game theory is the only part of economics theory that works. I hope it can do the same for biology.

This is the second of a series of posts on computational considerations of empirical and theoretical fitness landscapes. In the previous post I gave a historic overview of fitness landscapes as mental and mathematical metaphors.

References

Barrick, J. E., Yu, D. S., Yoon, S. H., Jeong, H., Oh, T. K., Schneider, D., Lenski, R.E., & Kim, J. F. (2009). Genome evolution and adaptation in a long-term experiment with Escherichia coli. Nature, 461(7268): 1243-1247.

Beerenwinkel, N., Pachter, L., Sturmfels, B. (2007a). Epistasis and shapes of fitness landscapes. Statistica Sinica 17: 1317-1342.

Beerenwinkel, N., Pachter, L., Sturmfels, B., Elena, S.F., Lenski, R.E. (2007b). Analysis of epistatic interactions and fitness landscapes using a new geometric approach. BMC Evolutionary Biology 7: 60.

Blount, Z. D., J. E. Barrick, C. J. Davidson, and R. E. Lenski. (2012). Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489: 513-518.

Bull, J. J., Badgett, M. R., Wichman, H. A., Huelsenbeck, J. P., Hillis, D. M., Gulati, A., Ho, C. & Molineux, I. J. (1997). Exceptional convergent evolution in a virus. Genetics, 147(4), 1497-1507.

Chou, H. H., Chiu, H. C., Delaney, N. F., Segrè, D., & Marx, C. J. (2011). Diminishing returns epistasis among beneficial mutations decelerates adaptation. Science, 332(6034): 1190-1192.

Clark, S.H., Huppi, K., Ruezinsky, D., Staudt, L., Gerhard, W., & Weigert, M. (1985). Inter- and intraclonal diversity in the antibody response to influenza hemagglutin. J. Exp. Med. 161: 687.

Cooper, V.S., & Lenski, R.E. (2000). The population genetics of ecological specialization in evolving Escherichia coli populations. Nature 407: 736-739.

Crews, S., Griffin, J., Huang, H., Calame, K., & Hood, L. (1981). A single V gene segment encodes the immune response to phosphorylcholine: somatic mutation is correlated with the class of the antibody. Cell 25: 59.

Crona, K., Greene, D., & Barlow, M. (2013). The peaks and geometry of fitness landscapes. Journal of Theoretical Biology.

Hall, D. W., Agan, M., & Pope, S. C. (2010). Fitness epistasis among 6 biosynthetic loci in the budding yeast Saccharomyces cerevisiae. Journal of Heredity, 101: S75-S84.

Kauffman, S.A., & Weinberger E.D. (1989). The NK model of rugged fitness landscapes and its application to maturation of the immune response. Journal of Theoretical Biology, 141(2): 211-245.

Kryazhimskiy, S., Tkacik, G., & Plotkin, J.B. (2009). The dynamics of adaptation on correlated fitness landscapes. Proc. Natl. Acad. Sci. USA 106(44): 18638-18643.

Lenski, R.E., & Travisano, M. (1994). Dynamics of adaptation and diversification: A 10,000-generation experiment with bacterial populations. Proc. Natl. Acad. Sci. USA 91: 6808-6814.

Lenski, Richard E. (2003). “Phenotypic and Genomic Evolution during a 20,000-Generation Experiment with the Bacterium Escherichia coli”. In Janick, Jules. Plant Breeding Reviews (New York: Wiley) 24(2): 225–65.

Livnat, A., Papadimitriou, C., Dushoff, J., & Feldman, M. W. (2008). A mixability theory for the role of sex in evolution. Proc. Natl. Acad. Sci. USA 105(50): 19803-19808.

Lozovsky, E. R., Chookajorn, T., Brown, K. M., Imwong, M., Shaw, P. J., Kamchonwongpaisan, S., Neafsey, D.E., Weinreich, D.M., & Hartl, D. L. (2009). Stepwise acquisition of pyrimethamine resistance in the malaria parasite. Proc. Natl. Acad. Sci. USA, 106(29): 12025-12030.

O’Maille, P. E., Malone, A., Dellas, N., Hess, B. A., Smentek, L., Sheehan, I., Greenhagen, B.T., Chappell, J., Manning, G., & Noel, J.P. (2008). Quantitative exploration of the catalytic landscape separating divergent plant sesquiterpene synthases. Nature Chemical Biology, 4(10): 617-623.

Orr, H.A. (1998). The population genetics of adaptation: the distribution of factors fixed during adaptive evolution. Evolution 52: 935-949.

Orr H.A. (2005). The genetic theory of adaptation: a brief history. Nature Reviews Genetics, 6 (2), 119-27 PMID: 15716908

Poelwijk, F.J., Kiviet, D.J., Weinreich, D.M., & Tans, S.J. (2007). Empirical fitness landscapes reveal accessible evolutionary paths. Nature 445: 383-386.

Poelwijk, F.J., Sorin, T.-N., Kiviet, D.J., Tans, S.J. (2011). Reciprocal sign epistasis is a necessary condition for multi-peaked fitness landscapes. Journal of Theoretical Biology 272: 141-144.

Szendro, I. G., Schenk, M. F., Franke, J., Krug, J., & de Visser, J. A. G. (2013). Quantitative analyses of empirical fitness landscapes. Journal of Statistical Mechanics: Theory and Experiment, 2013(01): P01005.

Tonegawa, S. (1983). Somatic generation of antibody diversity. Nature 302: 575.

Weinreich, D.M., Watson, R.A., Chan, L. (2005). Sign epistasis and genetic constraint on evolutionary trajectories. Evolution 59: 1165-1174.

Whitlock, M.C., Phillips, P.C., Moore, F.B.-G., and Tonsor, S.J. (1995). Multiple fitness peaks and epistasis. Annu. Rev. Ecol. Syst. 26: 601-629.

From the ivory tower of the School of Computer Science and Department of Psychology at McGill University, I marvel at the world through algorithmic lenses. My specific interests are in quantum computing, evolutionary game theory, modern evolutionary synthesis, and theoretical cognitive science. Previously I was at the Institute for Quantum Computing and Department of Combinatorics & Optimization at the University of Waterloo and a visitor to the Centre for Quantum Technologies at the National University of Singapore.

11 Responses to Epistasis and empirical fitness landscapes

• Thanks for the link, I haven’t read that paper before. It is a pretty standard use of the NK-model and makes all the common assumptions that annoy me: arbitrary choice of gene-interaction (why nearest neighbours and not any other choice?) and assumption that fitness is drawn from a very simple distribution. Nice to see the high mutation limit being studied, but not as interesting when it is only done with simulation — I would really like to see some nice analytic results.

1. Jon Awbrey says:

I can’t be sure, but you may find use for this —

Differential Logic and Dynamic Systems

NB. I’m currently in the process of upgrading the formatting, so that should improve over the next few days.

• How would I use this? I remember briefly looking into process calculus before, but deciding it’s really not my thing. can you give me an example of someone using this to analyze the dynamics of adaptive walks?