Evolution is a special kind of (machine) learning
February 14, 2014 9 Comments
Theoretical computer science has a long history of peering through the algorithmic lens at the brain, mind, and learning. In fact, I would argue that the field was born from the epistemological questions of what can our minds learn of mathematical truth through formal proofs. The perspective became more scientific with McCullock & Pitts’ (1943) introduction of finite state machines as models of neural networks and Turing’s B-type neural networks paving the way for our modern treatment of artificial intelligence and machine learning. The connections to biology, unfortunately, are less pronounced. Turing ventured into the field with his important work on morphogenesis, and I believe that he could have contributed to the study of evolution but did not get the chance. This work was followed up with the use of computers in biology, and with heuristic ideas from evolution entering computer science in the form of genetic algorithms. However, these areas remained non-mathematical, with very few provable statements or non-heuristic reasoning. The task of making strong connections between theoretical computer science and evolutionary biology has been left to our generation.
Although the militia of cstheorists reflecting on biology is small, Leslie Valiant is their standard-bearer for the steady march of theoretical computer science into both learning and evolution. Due in part to his efforts, artificial intelligence and machine learning are such well developed fields that their theory branch has its own name and conferences: computational learning theory (CoLT). Much of CoLT rests on Valiant’s (1984) introduction of probably-approximately correct (PAC) learning which — in spite of its name — is one of the most formal and careful ways to understand learnability. The importance of this model cannot be understated, and resulted in Valiant receiving (among many other distinctions) the 2010 Turing award (i.e. the Nobel prize of computer science). Most importantly, his attention was not confined only to pure cstheory, he took his algorithmic insights into biology, specifically computational neuroscience (see Valiant (1994; 2006) for examples), to understand human thought and learning.
Like any good thinker reflecting on biology, Valiant understands the importance of Dobzhansky’s observation that “nothing in biology makes sense except in the light of evolution”. Even for the algorithmic lens it helps to have this illumination. Any understanding of learning mechanisms like the brain is incomplete without an examination of the evolutionary dynamics that shaped these organs. In the mid-2000s, Valiant embarked on the quest of formalizing some of the insights cstheory can offer evolution, culminating in his PAC-based model of evolvability (Valiant, 2009). Although this paper is one of the most frequently cited on TheEGG, I’ve waited until today to give it a dedicated post.
Much like the definition of polynomial local search (PLS) (Johnson et al., 1988; Roughgarden, 2010) that I needed to show that evolutionary equilibria aren’t always easy to find (Kaznatcheev, 2013), Valiant’s model starts off with three main ingredients:
- An efficient algorithm I that selects an initial genotype. In most cases, this is picked uniformly at random from the set of all genotypes. For simplicity, the genotypes themselves are binary functions .
- An efficient algorithm F that accepts an environment (given by an ideal function f and probability distribution D) and a candidate genotype r and returns an objective function value:
In words: the correlation between f and r over the distribution D. Given his background in machine learning, Valiant realizes the importance of stochastic effects in evolution and introduces them as sampling noise. The algorithm F doesn’t always return the ideal performance, but instead an extra parameter s is passed. F then returns a stochastic value given by sampling s many times from D and looking at the correlation between f and r on those samples. For biological intuition, we can think of s as encoding some combination of the population size and life-length of the evolving organism.
- An efficient algorithm M that accepts a polynomial-sized neighbourhood N and a candidate genotype r and returns an output candidate genotype (in the paper, this is combination of Neigh, Mu, and Evolve). Unlike the definition of PLS, Valiant’s mutation-selection operator does not necessarily return a genotype of strictly higher fitness. Instead, he distinguishes between beneficial and nearly-neutral mutations (Ohta, 1973; 1992). The model has a tolerance t that can depend on the current genotype (although not on the environment given by (f,D), which means it cannot easily simulate the more important multiplicative aspects of relative fitness inherent in the selection coefficient) and classifies a potential mutant r’ as beneficial if and neutral if . M always samples a beneficial mutation if one is available, considering a neutral one only if no beneficial mutations are present. Since Valiant’s model makes r and element of its own neighbourhood, it means that there is always something to return unlike the PLS mutator that finishes when it reaches a local fitness peak.
The biggest departure from PLS and biology is the success criterion. Whereas PLS and most biologists are concerned with local fitness peaks, Valiant demands an approximation to the global peak. In particular, a function class C is evolvable if for any D and each , an algorithm that has the above form (not any arbitrary efficient algorithm as in PLS) can probably (i.e. with probability greater than ) find a genotype r that is approximately correct (i.e. ).
Under this definition, evolvable classes are a subset of efficiently PAC-learnable classes. In fact, they are a subset of a more restricted set of classes that are learnable with statistical queries (SQ; defined by Kearns (1998)) and equivalent to a certain subset of SQ-learnable (Feldman, 2008). This allows Valiant to conclude that some function classes like the parity functions (known as the function class Lin) are not evolvable, even though they are PAC-learnable (Fischer & Simon, 1992; Helmbold et al., 1992). In fact, Valiant proposes this class as a potential falsifier for his theory:
The class Lin may appear to be biologically unnatural. That is exactly the prediction of our theory, which asserts that evolution cannot be based on the evolvability of such a class.
This class does seem somewhat unnatural and synthetic biologists used to agree to some extent, with Tamsir et al. (2011) writing that the closely related XNOR function is empirically impossible to implement. However, they were proven wrong when Bonnet et al. (2012) implemented XNOR among many other amplifying gates by exploiting the structure and process of transcription. Unfortunately, this is an example from synthetic biology, and I don’t know if any examples occur naturally.
I definitely think Valiant (2009) provides a much more convincing model for studying evolution than Chaitin’s (2009) attempt. However, I still fear that even in its full generality its missing some important features, such as the ability to properly capture the Simpson-Baldwin effect (Baldwin 1886, 1902; Simpson, 1953). I also think that some of the conclusions for learning that Valiant draws (especially in his recent book) are not justified in light of the interface theory of perception (Hoffman, 2009), the important effect of subjective representations, and general social learning. Unfortunately, you will have to return later for a more detailed discussion of these critiques.
Baldwin, J.M. (1886). A new factor in evolution. Amer. Nat., 30: 441-451, 536-553.
Baldwin, J.M. (1902). Development and evolution. Macmillan, New York.
Bonnet J, Yin P, Ortiz ME, Subsoontorn P, & Endy D (2013). Amplifying Genetic Logic Gates. Science. PMID: 23539178
Chaitin, G. (2009). Evolution of Mutating Software. EATCS Bulletin, 97: 157-164.
Feldman, V. (2008). Evolvability from learning algorithms. In Proceedings of the 40th annual ACM symposium on Theory of Computing (pp. 619-628). ACM.
Fischer, P., & Simon, H. U. (1992). On learning ring-sum-expansions. SIAM Journal on Computing, 21(1): 181-192.
Helmbold, D., Sloan, R., & Warmuth, M. K. (1992). Learning integer lattices. SIAM Journal on Computing, 21(2): 240-266.
Hoffman, D.D. (2009). The interface theory of perception. In: Dickinson, S., Tarr, M., Leonardis, A., & Schiele, B. (Eds.), Object categorization: Computer and human vision perspectives. Cambridge University Press, Cambridge.
Johnson, D.S., Papadimitriou, C.H., & Yannakakis, M. (1988). How easy is local search? Journal of Computer and System Sciences, 37: 79-100.
Kaznatcheev, A. (2013). Complexity of evolutionary equilibria in static fitness landscapes. arXiv: 1308.5094v1.
Kearns, M. (1998). Efficiently noise tolerant learning from statistical queries. Journal of the ACM, 45(6): 983-1006.
McCulloch, Warren S., & Pitts, Walter (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133
Ohta, T. (1973). Slightly deleterious mutant substitutions in evolution. Nature, 246(5428): 96-98.
Ohta, T. (1992). The nearly neutral theory of molecular evolution. Annual Review of Ecology and Systematics, 23: 263-286.
Roughgarden, T. (2010). Computing equilibria: A computational complexity perspective. Economic Theory, 42:193-236.
Simpson, G.G. (1953). The Baldwin effect. Evolution, 7(2): 110-117.
Tamsir, A., Tabor, J. J., & Voigt, C. A. (2011). Robust multicellular computing using genetically encoded NOR gates and chemical wires. Nature, 469(7329): 212-215.
Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM, 27.
Valiant, L.G. (1994). Circuits of the Mind. Oxford University Press.
Valiant, L.G. (2006). A quantitative theory of neural computation. Biological Cybernetics, 95(3): 205-211.
Valiant, L.G. (2009). Evolvability Journal of the ACM, 56 (1) DOI: 10.1145/1462153.1462156