Toward an algorithmic theory of biology
June 9, 2013 5 Comments
When you typically think of computer scientists working on questions in biology, you probably picture a bioinformatician. Although bionformatics makes heavy use of algorithms and machine learning, and its practitioners are often mildly familiar with computational complexity (enough to know that almost everything they study is NP-complete), it doesn’t really apply computational thinking to understand or building theories in biology. Instead, it often generates or analyzes petabytes of blind data that biologists subsequently use to generate or test traditional verbal or physics-inspired hypotheses. The 2nd workshop on Natural Algorithms and the Sciences took a completely different approach.
The workshop was held on May 20th and 21st by Princeton’s Center for Computational Intractability and attracted speakers from biology, computer science, engineering, math and elsewhere. The meeting had a heavy focus on theoretical computer science and a return to the founding spirit of Alan Turing by tackling big fundamental questions in the sciences. It saw applications of computational complexity, computability theory, machine learning, distributed and parallel computing, and information theory. Although the mandate of the workshop is broader than looking at biology, most of the talks returned to questions in the biological sciences. I greatly enjoyed my time at the workshop, and intended to live blog the event. However, a poor internet connection at my residence, other time commitments, and the vast amount of ideas I wanted to cover instead translated into a series of seven posts (this is the eighth) that spanned the last three weeks. To make reading (and later reference) easier, this post is a TL;DR of the last seven posts. Each section is a short summary of a post and a list of the talks discussed; at the end I include a partial bibliography for further reading. Click through on the headings to learn more and join the discussion on specific topics!
Natural algorithms and the sciences
I wrote this post the night before the workshop began, summarizing its aims and my excitement to attend. I explained the basics of natural algorithms and summarized the three broad directions of natural computing: (1) finding metaheuristics to solve engineering problems, (2) designing non-electronics based implementations for computers, and (3) using ideas of theoretical computer science to reason about natural and social phenomena or the theories that describe them. The workshop and subsequent posts focused almost exclusively on the third theme, since it is least developed and holds the most promise for scientific (as opposed to merely technological) progress. I closed the post by stressing the difference between the work to be presented and typical simulations, and sketched the distinction between Eulerian and Lagrangian approaches to modeling.
Algorithmic view of historicity and separation of scales in biology
Featuring summaries of the talks:
- Albert Libchaber: “Experiments on artificial cells“,
- Leslie Valiant: “Algorithms for adaptive phenomena“,
- Simon Levin: “Bounded rationality and decision-making“,
- Naomi Ehrich Leonard: “Network topology and the evolution of leadership in collective migration“, and
- first part of Bernard Chazelle‘s opening remarks.
This post began with an obligatory tip-of-the-hat to the story of Thomas M.S. Chang and the creation of the first artificial cell before highlighting Libchaber’s modern experimental work on synthetic biology. To stress the difference between artificial and natural systems, Valiant looked at evolution as a special type of machine learning and how we can use computer science to study the question of a ‘designer’. I was concerned about Valiant’s reliance on static fitness landscapes; Levin mirrored my concern by starting his talk on the distinction between static and frequency-dependent fitness landscapes in evolution. However, he spent most of his time on questions of development and cognition, closing with herding behavior. Leonard continued the trend to higher orders of organization by looking at how individual cognition leads to the social learning dilemma of group migration. Chazelle generalized Leonard and Levin’s model into influence systems, and showed how it can be used to play with separations of timescales.
Big thanks to the editors at ScienceSeeker.org! They enjoyed this post enough to feature it in their picks.
Computer science on prediction and the edge of chaos
Featuring summaries of the talks:
- Joel Lebowitz: “Microscopic models of macroscopic behavior“,
- Cris Moore: “Universality, hardness, engineering, and messiness“,
- Mark Braverman: “Noise versus computational intractability in dynamics“, and
- second part of Bernard Chazelle’s opening remarks.
We can think of physicists working on statistical mechanics as the first agent-based modellers. Lebowitz sketched how mathematical physics are still continuing their quest (at least for some systems) to rigorously show that macrodynamics fatefully reproduce the aggregate behavior of the microstates. Moore showed that when it comes to computational systems, the tools of statistical mechanics are inadequate to distinguish drastically different problems like 3SAT and 3XOR. Along with Braverman, he considered fundamental obstacles to prediction such as the system being complete for its complexity class, universal, or chaotic. Thankfully, Braverman and Chazelle discussed that the obstacle of universality is often a knife-edge phenomena, and Moore suggested that we can overcome it with more recent cstheory tools like smoothed analysis.
Distributed computation in foraging desert ants
Featuring summaries of the talks:
- Ofer Feinerman: “Fighting noise with limited resources: an ant colony perspective“, and
- Amos Korman: “Integrating theoretical algorithmic ideas in empirical biological study“
Feinerman and Korman work together to combine experiment and computer science theory in studying desert ants. Since these ants do not use pheromone trails to communicate, they must coordinate in the ant hill before heading out for foraging. As such, they are the perfect candidates for looking at distributed computing. Korman sketched a general framework we can use for combining cstheory with natural experiments, and connecting biological parameters by using information-theoretic lower-bounds from computer science. The ant experiments make for some awesome videos, and I recommend watching them all on Feinerman’s research page.
This post caught the attention of WordPress Story Wrangler Cheri Lucas Rowlands. She was kind enough to feature it on the Freshly Pressed section of the WordPress frontpage. Thank you!
Mathematical models of running cockroaches and scale-invariance in cells
Featuring summaries of the talks:
- Phil Holmes: “The neuro-mechanics of running cockroaches: How much natural detail do we need?“, and
- Eduardo Sontag: “Some invariant aspects of dynamical behavior of cell signaling pathways“, or “Transient ‘phenotypes’ for biomolecular model discrimination“
In this post, I introduced the basic distinction between models as abstractions, heuristics, and insilications. Holmes and Sontag’s work is a great example of insilications. Much like we are used to in physics, their applied math work is detailed enough to provide quantitative predictions that are directly testable against the relevant biology. Both speakers stressed the importance of symmetries and invariants for studying dynamic systems, and Sontag’s work provided a useful theorem for detecting scale-invariance of response curves for the giant systems of ODEs used to model cells. After 15 years of experience modeling how cockroaches run, Holmes stressed an important general note for all mathematical modelers:
[We] still don’t understand non-linear dynamics beyond linearizations or general symmetries … [and] it is not clear which details matters a priori.
Microscopic computing in cells and with self-assembling DNA tiles
Featuring summaries of the talks:
- Luca Cardelli: “The cell cycle switch computes approximate majority“,
- Jack Lutz: “Parametrizing self-assembly“,
- Rebecca Schulman: “Universal molecular algorithms for learning and pattern formation“
This was the only series of talks that was not focused exclusively on using cstheory to understand natural sciences, but also on implementing computing or assembling technologies using insights gleaned from biology. Caredlli simulated and analyzed cellular computational circuits that — for computer scientists — define biology to show that they accurately implement the approximate majority algorithm from distributed computing. Lutz and Schulman looked even deeper inside the cell, to see how they can harness the self-assembly of DNA tiles to build simple nanoscale biological replicators. Lutz concentrated on the theory of faithful simulation of Turing Machines by DNA tiling systems. Schulman looked at how to experimentally realize these simple self-replicating systems. I was surprised to learn that DNA tile self-assembly can be run in the wet-lab!
Machine learning and prediction without understanding
Featuring a summary of the talk:
- Ishanu Chattopadhyay: “Data smashing: Finding causal similarity in natural data sets“
This was the only purely machine learning talk of the workshop. Since the goal of many talks at the meeting was to look at how to build theory and understanding in biology, I used this post as an opportunity to turn to some philosophy of science. Is big data good for us? Or will it stifle progress in the long term? For theoria, I asked if the proper versus improper learning distinction in computational learning theory can help frame the philosophical discussion over the merit of understanding versus prediction. For practica, I sketched the ideas behind Chattonpadhyay’s approach to anomaly detection directly with streams without explicitly learning the underlying probabilistic finite state automata.
This was the most popular post of the series on reddit, twitter, and Google Plus. It became TheEGG’s most popular post with 3890 views to date. At the time of this summary, all seven previous posts together garnered around 7.8 thousand views and hopefully helped popularize the algorithmic lens among readers!
Partial Bibliography
Barish, R. D., Schulman, R., Rothemund, P. W., & Winfree, E. (2009). An information-bearing seed for nucleating algorithmic self-assembly. Proceedings of the National Academy of Sciences, 106(15), 6054-6059.
Cardelli L, & Csikász-Nagy A (2012). The cell cycle switch computes approximate majority. Scientific Reports, 2
Chattopadhyay, I., Wen, Y., & Ray, A. (2010). Pattern Classification In Symbolic Streams via Semantic Annihilation of Information. American Control Conference arXiv: 1008.3667v1
Chazelle, B. (2012). Natural algorithms and influence systems. Communications of the ACM, 55 (12).
Doty, D., Lutz, J. H., Patitz, M. J., Schweller, R. T., Summers, S. M., & Woods, D. (2012). The tile assembly model is intrinsically universal. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on (pp. 302-310). IEEE.
Feinerman, O., & Korman, A. (2012). Memory Lower Bounds for Randomized Collaborative Search and Implications to Biology. 26th International Symposium on Distributed Computing (DISC)
Pais, D., & Leonard, N. (2013). Adaptive network dynamics and evolution of leadership in collective migration Physica D: Nonlinear Phenomena DOI: 10.1016/j.physd.2013.04.014
Schulman, R., & Winfree, E. (2005). Self-replication and evolution of DNA crystals. In Advances in Artificial Life (pp. 734-743). Springer Berlin Heidelberg.
Shoval O, Goentoro L, Hart Y, Mayo A, Sontag E, & Alon U (2010). Fold-change detection and scalar symmetry of sensory input fields. Proceedings of the National Academy of Sciences of the United States of America, 107(36): 15995-6000.
Valiant, L.G. (2009) Evolvability. Journal of the ACM 56(1): 3.










Machine learning and prediction without understanding
June 5, 2013 14 Comments
Big data is the buzzword du jour, permuting from machine learning to hadoop powered distributed computing, from giant scientific projects to individual social science studies, and from careful statistics to the witchcraft of web-analytics. As we are overcome by petabytes of data and as more of it becomes public, it is tempting for a would-be theorist to simply run machine learning and big-data algorithms on these data sets and take the computer’s conclusions as understanding. I think this has the danger of overshadowing more traditional approaches to theory and the feedback between theory and experiment.
Of course, I am not saying anything new, Chomsky has famously expressed his skepticism along similar lines with the prevalence of Machine Learning in linguistics research:
In this case, though, it seems that Chomsky might be fighting a losing battle. A lot of the funding for research is driven by results and predictions, and — for many — understanding is just an inessential step in that process. In linguistics, this is best summarized by a quip attributed to Frederick Jelinek:
Jelinek is referring to the advantage that purely statistical machine learning methods provide over explicit linguistic models. Coincidently, machine learning can also provide us with the tools to formally look at this debate. We can model learning a theory from data as proper learning: our real world is given by some real function
and our algorithm tries to find some function
that closely approximates this. This function class
is usually something structured that we can understand, like deterministic finite state machines, Markov decision processes, or probabilistic finite state automata. However, we can make the learning algorithm’s job easier by giving it more freedom in the representations it uses, by asking for a hypothesis
. In fact, we can consider the most general hypotheses class by asking the learning algorithm to generate a prediction instead of a theory. This choice makes the algorithm itself the hypothesis and thus the hypotheses class is equivalent to all efficiently computable functions. This is how a computer scientist can operationalize prediction and understanding. In a classic result, Pitt & Valiant (1988) showed that there are certain concept classes that cannot be efficiently properly learnt, but that we can learn to predict. In other words, it really is easier to predict than to understand.
Chattopadhyay, Wen, & Ray (2010) considered a processes that is governed by (or well approximated by) probabilistic finite state automata (PFSAs). We can think of this source as a PFSA inside black box: we can press a button that will generate a symbolic stream of observations but we cannot peek inside to see how the machine works. The authors also assume there are given a finite number of candidate PFSAs that they can examine as potential candidates for approximating the environment, their goal is to determine (in an online manner) , which candidate approximates the black box best and to what extent. Before they start receiving data from the black box, the authors use the candidate PFSAs to build annihilators that serve as a type of filter for the incoming data. The data that makes it through the filter is then compared to the simplest model that just outputs uncorrelated letters uniformly at random, and if it matches this behavior then the PSFA that generated this annihilator is a good approximation of the data.
More formally, the authors build an abelian group on the space of probability measures that induces a group structure on a restricted set of irreducible PFSAs. The restriction is fundamental since if this procedure worked for reducible PFSAs then it could be used to break the RSA cryptosystem — that secures all our online transactions — by the method of Kearns & Valiant (1994). The group operation corresponds to their annihilation procedure on data streams, the identity element in the null PFSA that generates symbols uniformly at random, and the inverse is computed during their preprocessing of the candidate PFSAs into annihilation procedures. The algorithm comes down to adding the unknown stream to inverses of the candidates in this abelian group and seeing if the result matches the identity. If the resulting stream (statistically) matches the identity then the inverted element must have been (a close approximation of) our unknown function.
The above procedure is efficient but inadvertently builds a PFSA representation of the original data — it provides some understanding. In his talk, Chattopadhyay showed that the procedure can be extended to the problem of comparing two data sources without first learning either one’s representation as a PFSA. We can compute the inverse directly on the raw stream data instead of first building a PFSA and then inverting it. Unfortunately, if we have
output symbols then we need
independent instances of the stream we want to invert. However, in this understanding-free setting, the total number of samples needed to determine with high accuracy if two sources are generated by the same process is lower than if we had to gain understanding by first building the PFSA. It is easier to compare than to understand.
This work has an obvious application in fields like finance, where a real model is impossible to come by. A trader can use old data to create the inverted stream annihilator and use it to annihilate new data tic-by-tic as it comes in to detect anomalies: the process suddenly shifting from its historic process to a new one. In the rest of his talk, Chattopadhyay stressed similar successes for his approach in other domains, and showed that it can match the behavior of much more carefully constructed understanding-rich models of anomaly detection in domains like the detection of gravitational waves.
If we can skip understanding and just get quick results, then why bother with theory and analytic treatments? It is tempting to say that more data and statistical analysis can’t possibly hurt (except maybe the opportunity cost of collecting it), they just give more playthings for theorists. But for me, the real problem is that a focus on data changes priorities. Scientists (due mostly to science funding) start to move away from trying to explain and understand phenomena and more to collecting data and data-mining it to make predictions without understanding. A focus on blind statistics on big data often provides great practical results in the short term (and that is why it wins funding) at the expense of the understanding needed for long term development of science. In particular I suspect that after a short success it will produce a field that doesn’t know how to ask new and interesting questions. A field that must rely on continuing to mine slight variations on the same sort of data for harder and harder to find practical gems. Unfortunately, this social consideration is not built into our machine learning based analysis.
Note: this is the sixth (and last!) review blogpost of a series (1, 2, 3, 4, and 5) on the recent workshop on Natural Algorithms and the Sciences at Princeton’s Center for Computational Intractability. Still to come is a general overview post summarizing the six reviews and making them easier to navigate.
References
Chattopadhyay, Ishanu, Wen, Yicheng, & Ray, Asok (2010). Pattern Classification In Symbolic Streams via Semantic Annihilation of Information American Control Conference arXiv: 1008.3667v1
Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM, 41(1), 67-95.
Pitt, L., & Valiant, L. G. (1988). Computational limitations on learning from examples. Journal of the ACM, 35(4), 965-984.
Filed under Commentary, Reviews, Technical Tagged with artificial intelligence, conference, cstheory, finance, machine learning, philosophy of science