March 25, 2014 9 Comments
The workshop on computational theories of evolution started off on Monday, March 17th with Leslie Valiant — one of the organizers — introducing his model of evolvability (Valiant, 2009). This original name was meant to capture what type of complexity can be achieved through evolution. Unfortunately — especially at this workshop — evolvability already had a different, more popular meaning in biology: mechanisms that make an organism or species ‘better’ at evolving, in the sense of higher mutations rates, de novo genes, recombination through sex, etc. As such, we need a better name and I am happy to take on the renaming task.
It seems that one of Valiant’s goals in defining his model was to draw a sharp demarcation between the Lamarkian nature of updates in typical machine learning algorithms, and the more limited inter-generational information transfer that he saw as the defining feature of Darwin’s theory. At this he succeeded spectacularly. In fact, I am inclined to say that his conception of evolution is in some ways more restrictive than that of most biologists. For example, the model does not allow one to easily incorporate epigenetics, development, or the effects of individual learning on fitness and thus evolutionary dynamics. It is an extremely strong Darwinism, and the name should reflect this.
In allusion to machine learning, Machine Darwinism is a tempting candidate. However, for many, machine learning is meant to refer to learning by non-biological machines like your laptop or Google’s servers. But Valiant did not set out to just capture in silico evolution, or genetic algorithms. Computational learning theory carries less of this baggage and the extra benefit of Valiant’s central position in the subfield. Unfortunately, the “computational” prefix does not have the connection to theory and mathematics that it could once evoke, instead it suggests the use of computing machines to achieve some goal like simulation or artificial life rather than finding conceptual and formal insights from the theory of computing machines. Thus, we have to rule out Computational Darwinism.
With no surprise given the title, what is left is my preferred modifier — algorithmic. I have already introduced in in the more general context of philosophy and general biology. It carries with it the connections to mathematics and theory, and the focus on dynamics and specific problems. I think it is a great fit, so after this unreasonably long preamble, I will refer to Valiant’s model as Algorithmic Darwinism. I hope he doesn’t mind.
Earlier on TheEGG blog, we already reviewed the basics of algorithmic Darwinism, so I won’t rehash all the details. I will just remind you that it is a way to formally model evolution as a special type of machine learning, a subset of learning through statistical queries. In the second talk of the day, Vitaly Feldman, went into more detail on the relationship between algorithmic Darwinism and machine learning classes (Feldman, 2008). Here, some unfortunately fragility of the model was revealed, a lock of robustness to changes between relatively reasonable loss functions.
On Thursday, Paul Valiant presented the fitness functions for algorithmic Darwinism most clearly as depending on three parameters: the space of target functions, say (other domains and ranges can be considered, for instance the reals (P. Valiant, 2012)), the distribution over inputs D, and the loss function L. Supposing we are trying to evolve the function and the current organism has a hypothesis then the fitness is given by an empirical approximation of given by m samples from D. Usually, we concentrate on the class of functions F, but it is clear that all three components (along with the mutation operator and class of representations H that define locality) matter for the fitness landscape.
Feldman (2008) showed that if we use the squares loss function then algorithmic Darwinism captures all of statistical query learning (Kearns, 1998). However, if we stick to the correlation loss function that Valiant first considered then the class is equivalent to correlational statistical queries (CSQ) which is known to be a strict subset of SQ (Bshouty & Feldman, 2002). I am not sure how to interpret this discrepancy biology. Regardless of the biological meaning, the connection to known complexity classes has practical significance. Feldman’s results give us a very useful tool. Now we know that any polynomial time CSQ algorithm, no matter how exotic seeming at first, can be transformed into an evolutionary dynamic with the appropriate choice of mutation operator.
The mutation operator is intimately linked to the organism’s internal representations (or, in machine learning terms: hypotheses class) on which it acts. Although algorithmic Darwinism does not place any specific restrictions on how to interpret the organism’s hypotheses class, the usual story is that they correspond to gene expression levels. Elaine Angelino tasked herself with making the model more appealing to biologists by connecting to a function class that real-world gene expression levels represent. She showed that we can think of gene expression levels as ratios of polynomials and that the transcription networks these polynomials encode have small depth and degree (or, in circuit terms: fan-in). The representations used in Valiant’s and Feldman’s result were more complicated that this.
Varun Kanade concentrated on the simplest case of these gene regulatory networks, and showed the learnability of sparse linear functions (Angelino & Kanade, 2014). I particularly enjoyed this result in how it jumped around hardness results in a biologically friendly way. In general, approximating sparse linear functions is NP-hard (Natarajan, 1995), but (like most complexity results) the analysis is worst case, which corresponds to very peculiar distributions over instance space. Angelino & Kanade (2014) avoid this by considering only smoothed distributions where they take an arbitrary distribution over and add a little bit of Gaussian noise to each coordinate. This follows the spirit of Spielman & Teng’s (2001,2009) smoothed analysis and is a model that lets us avoid the problems of worst-case results like knife-edge sensitivity. With smoothed distributions, sparse linear functions are learnable with sparse distribution via a very reasonable mutation operator.
Overall, the five talks on algorithmic Darwinism prompted a lot of discussion with a heavy focus on what (if anything) this model says about biology, and how interested biologists should use it. Given the very strong exclusion of non-genetic effects in the model, I was partial to Chrisantha Fernando‘s suggest that this is a good null model, or in cstheory terms: this model should be used for lower-bounds and seperation from more realistic models. However, in a coffee break discussion over cashews, Vitaly Feldman disagreed with my assessment and suggested that for him the connections to other learning classes are really a source of upper-bounds, i.e. a source of evolutionary algorithms for achieving actual tasks. This engineering application never occurred to me, but with reflection it does seem possible that a deeper understanding of algorithmic Darwinism could yield better directed evolution for tasks like drug design or industrial concerns.
This is the second post in a series from the workshop on computational theories of evolution. The previous post was a general introduction.
Angelino, E., & Kanade, V. (2014). Attribute-efficient evolvability of linear functions<. In Proceedings of the 5th conference on Innovations in Theoretical Computer Science (pp. 287-300). ACM.
Bshouty, N.H., & Feldman, V. (2002). On using extended statistical queries to avoid membership queries. The Journal of Machine Learning Research, 2: 359-395.
Feldman, V. (2008). Evolvability from learning algorithms. Proceedings of the 40th annual ACM symposium on Theory of Computing, 619-628 DOI: 10.1145/1374376.1374465
Kearns, M. (1998). Efficiently noise tolerant learning from statistical queries. Journal of the ACM, 45(6): 983-1006.
Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24(2), 227-234.
Spielman, D., & Teng, S. H. (2001). Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. In Proceedings of the thirty-third annual ACM symposium on Theory of computing (pp. 296-305). ACM.
Spielman, D. A., & Teng, S. H. (2009). Smoothed analysis: an attempt to explain the behavior of algorithms in practice. Communications of the ACM, 52(10), 76-84.
Valiant, L.G. (2009) Evolvability. Journal of the ACM, 56(1): 3.
Valiant, P. (2012). Distribution free evolvability of polynomial functions over all convex loss functions. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (pp. 142-148). ACM.