Evolution is a special kind of (machine) learning

Theoretical computer science has a long history of peering through the algorithmic lens at the brain, mind, and learning. In fact, I would argue that the field was born from the epistemological questions of what can our minds learn of mathematical truth through formal proofs. The perspective became more scientific with McCullock & Pitts’ (1943) introduction of finite state machines as models of neural networks and Turing’s B-type neural networks paving the way for our modern treatment of artificial intelligence and machine learning. The connections to biology, unfortunately, are less pronounced. Turing ventured into the field with his important work on morphogenesis, and I believe that he could have contributed to the study of evolution but did not get the chance. This work was followed up with the use of computers in biology, and with heuristic ideas from evolution entering computer science in the form of genetic algorithms. However, these areas remained non-mathematical, with very few provable statements or non-heuristic reasoning. The task of making strong connections between theoretical computer science and evolutionary biology has been left to our generation.

ValiantAlthough the militia of cstheorists reflecting on biology is small, Leslie Valiant is their standard-bearer for the steady march of theoretical computer science into both learning and evolution. Due in part to his efforts, artificial intelligence and machine learning are such well developed fields that their theory branch has its own name and conferences: computational learning theory (CoLT). Much of CoLT rests on Valiant’s (1984) introduction of probably-approximately correct (PAC) learning which — in spite of its name — is one of the most formal and careful ways to understand learnability. The importance of this model cannot be understated, and resulted in Valiant receiving (among many other distinctions) the 2010 Turing award (i.e. the Nobel prize of computer science). Most importantly, his attention was not confined only to pure cstheory, he took his algorithmic insights into biology, specifically computational neuroscience (see Valiant (1994; 2006) for examples), to understand human thought and learning.

Like any good thinker reflecting on biology, Valiant understands the importance of Dobzhansky’s observation that “nothing in biology makes sense except in the light of evolution”. Even for the algorithmic lens it helps to have this illumination. Any understanding of learning mechanisms like the brain is incomplete without an examination of the evolutionary dynamics that shaped these organs. In the mid-2000s, Valiant embarked on the quest of formalizing some of the insights cstheory can offer evolution, culminating in his PAC-based model of evolvability (Valiant, 2009). Although this paper is one of the most frequently cited on TheEGG, I’ve waited until today to give it a dedicated post.

Much like the definition of polynomial local search (PLS) (Johnson et al., 1988; Roughgarden, 2010) that I needed to show that evolutionary equilibria aren’t always easy to find (Kaznatcheev, 2013), Valiant’s model starts off with three main ingredients:

  1. An efficient algorithm I that selects an initial genotype. In most cases, this is picked uniformly at random from the set of all genotypes. For simplicity, the genotypes themselves are binary functions \{0,1\}^n \rightarrow \{1,-1\}.
  2. An efficient algorithm F that accepts an environment (given by an ideal function f and probability distribution D) and a candidate genotype r and returns an objective function value:

    \text{Perf}_f(r,D) = \sum_{x \in \{0,1\}^n} D(x) f(x) r(x)

    In words: the correlation between f and r over the distribution D. Given his background in machine learning, Valiant realizes the importance of stochastic effects in evolution and introduces them as sampling noise. The algorithm F doesn’t always return the ideal performance, but instead an extra parameter s is passed. F then returns a stochastic value given by sampling s many times from D and looking at the correlation between f and r on those samples. For biological intuition, we can think of s as encoding some combination of the population size and life-length of the evolving organism.

  3. An efficient algorithm M that accepts a polynomial-sized neighbourhood N and a candidate genotype r and returns an output candidate genotype (in the paper, this is combination of Neigh, Mu, and Evolve). Unlike the definition of PLS, Valiant’s mutation-selection operator does not necessarily return a genotype of strictly higher fitness. Instead, he distinguishes between beneficial and nearly-neutral mutations (Ohta, 1973; 1992). The model has a tolerance t that can depend on the current genotype (although not on the environment given by (f,D), which means it cannot easily simulate the more important multiplicative aspects of relative fitness inherent in the selection coefficient) and classifies a potential mutant r’ as beneficial if \mathbf{F}_{f,D}(r') -  \mathbf{F}_{f,D}(r) \geq  t and neutral if \mathbf{F}_{f,D}(r') - \mathbf{F}_{f,D}(r) \in [-t,t). M always samples a beneficial mutation if one is available, considering a neutral one only if no beneficial mutations are present. Since Valiant’s model makes r and element of its own neighbourhood, it means that there is always something to return unlike the PLS mutator that finishes when it reaches a local fitness peak.

The biggest departure from PLS and biology is the success criterion. Whereas PLS and most biologists are concerned with local fitness peaks, Valiant demands an approximation to the global peak. In particular, a function class C is evolvable if for any D and each f \in C, an algorithm that has the above form (not any arbitrary efficient algorithm as in PLS) can probably (i.e. with probability greater than 1 - \delta) find a genotype r that is approximately correct (i.e. \text{Perf}_{f,D}(r) \geq 1 - \epsilon).

Under this definition, evolvable classes are a subset of efficiently PAC-learnable classes. In fact, they are a subset of a more restricted set of classes that are learnable with statistical queries (SQ; defined by Kearns (1998)) and equivalent to a certain subset of SQ-learnable (Feldman, 2008). This allows Valiant to conclude that some function classes like the parity functions (known as the function class Lin) are not evolvable, even though they are PAC-learnable (Fischer & Simon, 1992; Helmbold et al., 1992). In fact, Valiant proposes this class as a potential falsifier for his theory:

The class Lin may appear to be biologically unnatural. That is exactly the prediction of our theory, which asserts that evolution cannot be based on the evolvability of such a class.

This class does seem somewhat unnatural and synthetic biologists used to agree to some extent, with Tamsir et al. (2011) writing that the closely related XNOR function is empirically impossible to implement. However, they were proven wrong when Bonnet et al. (2012) implemented XNOR among many other amplifying gates by exploiting the structure and process of transcription. Unfortunately, this is an example from synthetic biology, and I don’t know if any examples occur naturally.

I definitely think Valiant (2009) provides a much more convincing model for studying evolution than Chaitin’s (2009) attempt. However, I still fear that even in its full generality its missing some important features, such as the ability to properly capture the Simpson-Baldwin effect (Baldwin 1886, 1902; Simpson, 1953). I also think that some of the conclusions for learning that Valiant draws (especially in his recent book) are not justified in light of the interface theory of perception (Hoffman, 2009), the important effect of subjective representations, and general social learning. Unfortunately, you will have to return later for a more detailed discussion of these critiques.

References

Baldwin, J.M. (1886). A new factor in evolution. Amer. Nat., 30: 441-451, 536-553.

Baldwin, J.M. (1902). Development and evolution. Macmillan, New York.

Bonnet J, Yin P, Ortiz ME, Subsoontorn P, & Endy D (2013). Amplifying Genetic Logic Gates. Science. PMID: 23539178

Chaitin, G. (2009). Evolution of Mutating Software. EATCS Bulletin, 97: 157-164.

Feldman, V. (2008). Evolvability from learning algorithms. In Proceedings of the 40th annual ACM symposium on Theory of Computing (pp. 619-628). ACM.

Fischer, P., & Simon, H. U. (1992). On learning ring-sum-expansions. SIAM Journal on Computing, 21(1): 181-192.

Helmbold, D., Sloan, R., & Warmuth, M. K. (1992). Learning integer lattices. SIAM Journal on Computing, 21(2): 240-266.

Hoffman, D.D. (2009). The interface theory of perception. In: Dickinson, S., Tarr, M., Leonardis, A., & Schiele, B. (Eds.), Object categorization: Computer and human vision perspectives. Cambridge University Press, Cambridge.

Johnson, D.S., Papadimitriou, C.H., & Yannakakis, M. (1988). How easy is local search? Journal of Computer and System Sciences, 37: 79-100.

Kaznatcheev, A. (2013). Complexity of evolutionary equilibria in static fitness landscapes. arXiv: 1308.5094v1.

Kearns, M. (1998). Efficiently noise tolerant learning from statistical queries. Journal of the ACM, 45(6): 983-1006.

McCulloch, Warren S., & Pitts, Walter (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133

Ohta, T. (1973). Slightly deleterious mutant substitutions in evolution. Nature, 246(5428): 96-98.

Ohta, T. (1992). The nearly neutral theory of molecular evolution. Annual Review of Ecology and Systematics, 23: 263-286.

Roughgarden, T. (2010). Computing equilibria: A computational complexity perspective. Economic Theory, 42:193-236.

Simpson, G.G. (1953). The Baldwin effect. Evolution, 7(2): 110-117.

Tamsir, A., Tabor, J. J., & Voigt, C. A. (2011). Robust multicellular computing using genetically encoded NOR gates and chemical wires. Nature, 469(7329): 212-215.

Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM, 27.

Valiant, L.G. (1994). Circuits of the Mind. Oxford University Press.

Valiant, L.G. (2006). A quantitative theory of neural computation. Biological Cybernetics, 95(3): 205-211.

Valiant, L.G. (2009). Evolvability Journal of the ACM, 56 (1) DOI: 10.1145/1462153.1462156

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

18 Responses to Evolution is a special kind of (machine) learning

  1. Ahmadjordan says:

    Genetic algorithms and selection laws are made random to mimic the natural evolution which happens in the creatures . also the Neural network and Fuzzy logic systems are another simulations of the human ability to learn but all of the tend to fail at certain conditions . this could lead to that the creature of the human is definitely perfect.

  2. Pingback: Evolution is a special kind of (machine) learni...

  3. vznvzn says:

    hey A. its great to see some shoutout to ML from the algorithmic lens angle. it seems not so popular in TCS complexity theory (but of course nobody would admit that) & PAC is a great area that unites the two. but note theres a ton of really innovative ML going on outside of PAC. imho Koza deserves a prize for his work with genetic algorithms for example. maybe if someone like that gets a prize, the second-class status in TCS might finally start to wear off.

  4. Pingback: Unrelated to all that, 2/21 edition | neuroecology

  5. Boris Borcic says:

    Hello Artem. The post makes my antennas twitch at the shadow defined by the discrepancy Valiant asserts between evolvable and learnable; and whether what he does can the least justify to imagine human persons to in practice display minds/understanding/ideas with features rather of evolved or rather of learned minds/understanding/ideas.

    Hum, isn’t the situation of our minds-forming that we individually add learning (with whatever power) to an evolved biological and cultural background we inherit, our learning feeding back into the background evolution? Whatever, I must admit to acting here similar to newbies reading code and fooled into believing the machine is in any way commanded by the meaning they themselves find to salient identifiers… (OTOH library entry points dominate coding nowadays, so). Please don’t bother investing much energy into making that clear.

    In another direction, Intelligent Design advocates could be exposed to the idea that whatever designing intelligence they’d manage to prove intervenes, would be an expression of that difference between the learnable and the evolvable (the weakness on “modeling the more important multiplicative aspects of relative fitness” shouldn’t be an issue at all when confronting IDists). I for one could live with “supernatural” intervention in nature, limited to nature displaying against expectations power to the level our best science allows to learning.

    • Hum, isn’t the situation of our minds-forming that we individually add learning (with whatever power) to an evolved biological and cultural background we inherit, our learning feeding back into the background evolution?

      There are two ways we can think about this feedback: change in landscape and individual learning.

      The first way is through our culture and mass actions reshaping the fitness landscape. This is outside the scope of Valiant’s model, since he concentrates on static fitness landscapes, which I think is an acceptable restriction as a base-case to understand (of course, I am biased since my work focuses on static landscapes as well). Sure, it is an idealization, but one that can be explored experimentally, for instance with Lenski’s LTEE and thus not completely irrelevant. If Valiant was pressed, however, he would probably find some way to capture this with his idea of evolutionary target pursuit. This part, however, is not formalized by him and not of interest to me.

      The second way is through the effect of individual learning on fitness; i.e. the Baldwin effect. I don’t think Valiant does this justice (actually, I suspect he is unfamiliar with it and makes the standard assumption that just because individual learning is not transmitted that it is evolutionarily irrelevant), and I am currently working on showing what formal consequences this omission has.

      In another direction, Intelligent Design advocates could be exposed to the idea that whatever designing intelligence they’d manage to prove intervenes, would be an expression of that difference between the learnable and the evolvable.

      If you want to do justice to IDers (although I am not sure why we would want to, since they seem to be seldom interested in honest inquiry :() then bounding the Designer as an efficient PAC-learner is unfair. At the very least, you would have to allow the Designer non-determinism (for me, this is the reasonable way to model ‘magic’, anything more powerful just becomes silly and unproductive to think about). This is the approach that Chaitin takes in contrasting design versus evolution, but he constrains both God and evolution to only follow adaptive paths (i.e. increasing in fitness) which is not something that Valiant demands of either.

      If we allow the strong demand of only adaptive paths then I show a much more powerful result than Chaitin. In particular, my work gives examples like the general NK-model where even non-deterministic choice of adaptive mutations will not help you find a local fitness peak much more efficiently than blind trial-and-error or evolution. On the other hand, I show that on semi-smooth fitness landscapes, evolution is still expected to take an exponential number of adaptive steps while a short adaptive path always exist to the only peak (and thus the non-deterministic designer could always follow it in linear time). Of course, on smooth fitness landscapes both evolution and design find the peak quickly. However, I wouldn’t want to stress these connections to ‘design’ outside of blog posts.

      • Boris Borcic says:

        Thanks for all the goods!

        Indeed IDers… since you’ve just implied that blog post context allowed marginal language, I’d say they remind me of Israel:)

        Skipping over the yet-to-be-completed in-depth exploration of your expositions aligned Wikipedia-like as they are – indeed, unsure of the actual delivery schedule of the humility, zeal, and leisure to read all that perfection to the end – I mean just to contribute a single bit of precision on my own origin position about design in this frame.

        From first-hand experience of compacting code seriously over local compaction minima, I can affirm that compaction strategies can synthesize perfect appearances of design (sub)intentions they never had in mind (in the standard sense). As a consequence, the background focus on Design by the IDers, turns into something of their self-portrait as idolaters.

  6. Pingback: A Fistful of Computational Philosophy

  7. Pingback: Computational theories of evolution | Theory, Evolution, and Games Group

  8. Pingback: Why academics should blog and an update on readership | Theory, Evolution, and Games Group

  9. Pingback: Algorithmic Darwinism | Theory, Evolution, and Games Group

  10. Pingback: Cataloging a year of blogging: cancer and biology | Theory, Evolution, and Games Group

  11. Pingback: Evolution As Machine Learning | Large Numbers

  12. Pingback: Quick links (#30) | Urban Future (2.1)

  13. Pingback: A Fistful of Computational Philosophy

  14. Pingback: Mutation-bias driving the evolution of mutation rates | Theory, Evolution, and Games Group

  15. Pingback: Fitness distributions versus fitness as a summary statistic: algorithmic Darwinism and supply-driven evolution | Theory, Evolution, and Games Group

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.