Computational complexity of evolutionary equilibria
August 31, 2013 26 Comments
The first half of the 20th century is famous for revolutionary changes — paradigm shifts as Kuhn would say — across the sciences and other systems of thought. Disheartened by the scars of the First World War, European thinkers sought refuge by shifting their worldviews away from those of their fathers. In fields like biology, this meant looking for guidance to your grandfathers, instead. The founders of the modern synthesis reconciled the fading ideas of Wallace’s natural selection with Mendelian genetics. In the process, they unified many branches of biology, that at the dawn of the 20th century had been diverging, into a single paradigm that persists today. A return to evolution by natural selection illuminated their field and ended the eclipse of Darwinism. At the same time, mathematicians questioned Hilbert’s formalism and Russell’s logicism as valid foundations for their own field. As a result, they formalized mechanistic calculation and logical thought as computation and founded theoretical computer science to study its power and limitations. Even though some pioneers — like Alan Turing — kept an eye on biology, the two streams of thought did not converge. The coming of the Second World War pushed both fields away from each other and deep foundational questions, entrenching them in applied and technological work.
For the rest of the 20th century, the fields remained largely independent. Computer science only looked to biology for vague inspiration of heuristics in the form of evolutionary computing (Holland, 1975), and biologists looked at computer science as an engineering or technical field that could only provide them with applied tools like bioinformatics. Neither field saw in the other a partner for addressing deep theoretical questions. As I mentioned previously, my goal is to remedy this by applying ideas from analysis of algorithms and computational complexity to fundamental questions in biology. Sometime in late October, I tweeted my joy at seeing evolution clearly through the algorithmic lens. On Friday the 23rd, after much delay, I turned this into an ArXiv preprint ambitiously titled “Complexity of evolutionary equilibria in static fitness landscapes“. The paper combines the discussion of evolution from my previous few blog posts with three simple technical results. Although it was written for theoretical computer scientists, I tried to make it accessible to mathematical biologists as well; the hope is to serve as a launching point for discussion between the two disciplines.
A fitness landscape is a genetic space — with two genotypes adjacent if they differ in a single locus (position) — and a fitness function. Evolutionary dynamics produce a flow on this landscape from lower fitness to higher; reaching equilibrium only if a local fitness peak is found. In a finite fitness landscape, such equilibria always exist, and biologists have tended to use this existence to conclude that the population can quickly reach one of these equilibria. My goal is to question this assumption. Although in section 2.5, I discuss some empirical considerations, my focus is not to say something directly about biology but about the consistency of popular models of biology. As such, I focus on the NK model, not because I know it to be the best model out there, but because it is one of the most popular and a good proxy for the intuition of mathematical biologists.
In this paper I explain why the biologist’s assumption of being at a local equilibrium is not reasonable. Unlike most of the other work on fitness landscapes, instead of making arbitrary assumptions about statistical distributions of fitness contributions and structure, I try to make as few assumptions as possible. Instead of analysis through simulation, I use proofs from computational complexity and analysis of simplex algorithms. In some ways this is inspired by, but also contrary to, Orr-Gillespie theory. Whereas Orr-Gillespie minimize the number of assumptions and parameters related to the statistical properties of landscapes by ignoring structure, I eliminate assumptions on statistics by focusing on the structure and worst-case behaviour of the landscape.
If we assume that every gene interacts with 2 or more other genes then what’s the worst that can happen in terms of adaptation times? This assumption is captured by the NK model with that I show to be PLS-complete. PLS is short for polynomial local search, this is a complexity class associated with discrete search problems where a solution is known to exist, but we don’t necessarily need to find the best one, just one that can’t be improved by any small local change. For a problem to be in PLS, it has to have a representation using three polynomial time algorithms:
- An algorithm I that accepts an instance (like a description of a fitness landscape) and outputs a first candidate to consider (the wild type).
- An algorithm F that accepts an instance and a candidate and returns an objective function value (computes the fitness).
- An algorithm M that accepts an instance and a candidate and returns an output candidate with a strictly higher objective function value, or says that the candidate is a local maximum.
We say that an algorithm solves a problem in PLS if given any instance, it returns a candidate which M recognizes as a local maximum. Note that an algorithm for solving a PLS problem doesn’t need to use algorithms I,F,M and can rely on any shortcuts it can find directly in the problem instance. In terms of fitness landscapes, it means that a general algorithm need not follow adaptive paths or consider just single mutations. It can be any algorithm whatsoever, including any known mechanism — recombination, populations with horizontal gene transfer (like plasmids), multiple mutations, etc — or yet to be discovered mechanisms. I show that the NK model with is PLS-complete, this means that solving it is as hard as solving any problem in PLS. Since most computer scientists believe that there are problems in PLS that are not solvable in polynomial time, this means that there are fitness landscapes with direct interactions confined to just 3 genes (K = 2) such that no mechanism (biological or even arbitrary computation) can find a local equilibrium in a polynomial number of steps.
Of course, like most of the complexity class results in computers science, it is not known if all problems are solvable in PLS or not (think of the P versus NP question to see the difficulty in proving such a separation); and like any open problem, there are opinions on both sides. For instance, after his presentation in Princeton this Tuesday, noted cstheorist Vijay Vazirani remarked that PLS, and it’s sibling — of algorithmic game-theory fame — PPAD, are polynomial time solvable for all practical purposes. This means that even though some problems in PLS might not be solvable in polynomial time in the worst case, for most instances that we care about, quick solutions can be found. Thus, to a biologist unfamiliar with computational complexity, PLS-completeness might not be convincing evidence for the difficulty of finding local equilibria.
To provide more biologically friendly evidence for the unreachability of local equilibria, I also include a result that is not conditioned on any complexity theoretic assumptions but only applies to a subset of possible evolutionary mechanisms. The reduction from a variant of weighted 2SAT that I build to prove PLS-completeness is a bijection on adaptive paths, allowing me to use a nice theorem by Schaffer, and Yannakakis (1991) to show that in some instances there exist initial configurations from which every adaptive path to a local optimum is of exponential length. This means that if we consider evolutionary dynamics that only allow the invasion of fitter mutants (regardless of how those mutants are selected) then there exist fitness landscapes with direct interactions of only 3 genes (K = 2) and wild-types where evolution will not find a local fitness peak in any reasonable amount of time.
A biologist might object that these landscapes are too complicated, and real fitness landscapes always have short adaptive paths. Well, in terms of epistasis, if there is only magnitude epistasis then we have a smooth (also called Mt. Fuji) landscape, and every shortest undirected path (ignoring fitness changes) is also a directed adaptive path; every adaptive mutation takes us a step closer to the unique fitness peak. The smallest change we can consider is to introduce sign epistasis without reciprocal sign epistasis. I prove that this is equivalent to acyclic unique sink orientations from the analysis of simplex algorithms, and to every vertex having at least one of its shortest undirected paths to the unique global maximum being an adaptive path. Yet, even on this simple landscape a standard biological model of adaptation — strong selection weak mutation dynamics — still takes
in expectation steps to find the equilibrium. Not fast enough for the static fitness landscape model to be useful!
Of course, I am not the first one to bring in considerations of time in evolution. In fact, one of the ushers for the eclipse of Darwin that the modern synthesis ended was Lord Kelvin’s critique suggesting that the Earth (or Sun) have not been around long enough (by estimates based on the limited physics of the time) for the complexity we see around us to evolve. This argument was shown to have an unreasonable physical basis, but from a modern perspective was also lacking because there is not a clear way to characterize biological complexity or relate it to requisite time-scales. In 2006, Valiant (2009) brought computer science into this discussion with his theory of evolvability. He views organisms as protein circuits and evolution as a learning algorithm that is trying to have the organisms approximate an ideal function. This approach provides a beautiful synthesis of machine learning, artificial intelligence, and evolution, but is not presented in a language familiar to biologists or form that is easily amenable to experimental studies. Hence, it has received little attention and no citations from biologists, with all subsequent work coming only from other cstheorists.
Although the topic of time scales in general evolution is closer to Valiant’s work, I drew my inspiration for approach from Papadimitriou. In his study of mixability as an alternative selected quantity in sexual populations, he embraces existing biological models and metaphors and collaborates with biologists to make his work approachable (Livnat et al., 2008; 2010; 2011; Chastain et al., 2013). I try to emulate this approach and conclude the paper with three possible (not necessarily mutually exclusive) responses for mathematical biologists to my results:
- Abandon the fitness landscape metaphor.
- Redefine or restrict existing fitness landscape models, and show that these modifications are empirically reasonable and that fitness peaks are efficiently reachable in these models.
- Accept that local equilibrium assumptions are unjustified, and develop a theory that is comfortable with organisms that are far from equilibrium and always have more room to adapt.
My preference is the third option, because it expands the scope of biological modeling and questions some of the basic ways we talk about biology. Without the assumption of being at a local equilibrium, we have to change our language from “adapted for” to “adapting to”, which helps avoid teleological tangents. If we are not allowed to assume that organisms are at local fitness peaks, then we have to always think of evolution as a process, not a destination. In many parts of evolutionary thought, this not a new revelation, but hopefully the rigorous illumination of it through the algorithmic lens is novel. I can’t predict what wonderful paradigm shifts await us in the first half of the 21st century, but I hope that the careful integration of analysis of algorithms and computational complexity into the fundamental sciences is a one of them.
This is the fifth of a series of posts on computational considerations of empirical and theoretical fitness landscapes. In the previous post I gave a historic overview of fitness landscapes as mental and mathematical metaphors, highlighted our empirical knowledge of them, explained the Orr-Gillespie theory for minimizing statistical assumptions, and introduced the NK and block models.
Chastain, E., Livnat, A., Papadimitriou, C., & Vazirani, U. (2013). Multiplicative updates in coordination games and the theory of evolution. Proceedings of the 4th conference on Innovations in Theoretical Computer Science arXiv: 1208.3160.
Holland, J.H. (1975). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. University of Michigan Press.
Kaznatcheev, Artem (2013). Complexity of evolutionary equilibria in static fitness landscapes. ArXiv arXiv: 1308.5094v1
Livnat, A., Papadimitriou, C., Dushoff, J., & Feldman, M.W. (2008). A mixability theory for the role of sex in evolution. Proceedings of the National Academy of Sciences, USA 105(50): 19803-19808.
Livnat, A., Papadimitriou, C., Pippenger, N., & Feldman, M.W. (2010). Sex, mixability, and modularity. Proceedings of the National Academy of Sciences, USA 107(4): 1452-1457.
Livnat, A., Papadimitriou, C., & Feldman, M.W. (2011). An analytical contrast between fitness maximization and selection for mixability. Journal of Theoretical Biology 273(1): 232-234.
Schaffer, A.A., & Yannakakis, M. (1991) Simple local search problems that are hard to
solve. SIAM Journal on Computing, 20(1):56-87.
Valiant, L.G. (2009). Evolvability. Journal of the ACM 56(1): 3. (First appeared on ECCC in 2006)