Toward an algorithmic theory of biology
June 9, 2013 20 Comments
When you typically think of computer scientists working on questions in biology, you probably picture a bioinformatician. Although bionformatics makes heavy use of algorithms and machine learning, and its practitioners are often mildly familiar with computational complexity (enough to know that almost everything they study is NP-complete), it doesn’t really apply computational thinking to understand or building theories in biology. Instead, it often generates or analyzes petabytes of blind data that biologists subsequently use to generate or test traditional verbal or physics-inspired hypotheses. The 2nd workshop on Natural Algorithms and the Sciences took a completely different approach.
The workshop was held on May 20th and 21st by Princeton’s Center for Computational Intractability and attracted speakers from biology, computer science, engineering, math and elsewhere. The meeting had a heavy focus on theoretical computer science and a return to the founding spirit of Alan Turing by tackling big fundamental questions in the sciences. It saw applications of computational complexity, computability theory, machine learning, distributed and parallel computing, and information theory. Although the mandate of the workshop is broader than looking at biology, most of the talks returned to questions in the biological sciences. I greatly enjoyed my time at the workshop, and intended to live blog the event. However, a poor internet connection at my residence, other time commitments, and the vast amount of ideas I wanted to cover instead translated into a series of seven posts (this is the eighth) that spanned the last three weeks. To make reading (and later reference) easier, this post is a TL;DR of the last seven posts. Each section is a short summary of a post and a list of the talks discussed; at the end I include a partial bibliography for further reading. Click through on the headings to learn more and join the discussion on specific topics!
I wrote this post the night before the workshop began, summarizing its aims and my excitement to attend. I explained the basics of natural algorithms and summarized the three broad directions of natural computing: (1) finding metaheuristics to solve engineering problems, (2) designing non-electronics based implementations for computers, and (3) using ideas of theoretical computer science to reason about natural and social phenomena or the theories that describe them. The workshop and subsequent posts focused almost exclusively on the third theme, since it is least developed and holds the most promise for scientific (as opposed to merely technological) progress. I closed the post by stressing the difference between the work to be presented and typical simulations, and sketched the distinction between Eulerian and Lagrangian approaches to modeling.
- Albert Libchaber: “Experiments on artificial cells“,
- Leslie Valiant: “Algorithms for adaptive phenomena“,
- Simon Levin: “Bounded rationality and decision-making“,
- Naomi Ehrich Leonard: “Network topology and the evolution of leadership in collective migration“, and
- first part of Bernard Chazelle‘s opening remarks.
This post began with an obligatory tip-of-the-hat to the story of Thomas M.S. Chang and the creation of the first artificial cell before highlighting Libchaber’s modern experimental work on synthetic biology. To stress the difference between artificial and natural systems, Valiant looked at evolution as a special type of machine learning and how we can use computer science to study the question of a ‘designer’. I was concerned about Valiant’s reliance on static fitness landscapes; Levin mirrored my concern by starting his talk on the distinction between static and frequency-dependent fitness landscapes in evolution. However, he spent most of his time on questions of development and cognition, closing with herding behavior. Leonard continued the trend to higher orders of organization by looking at how individual cognition leads to the social learning dilemma of group migration. Chazelle generalized Leonard and Levin’s model into influence systems, and showed how it can be used to play with separations of timescales.
- Joel Lebowitz: “Microscopic models of macroscopic behavior“,
- Cris Moore: “Universality, hardness, engineering, and messiness“,
- Mark Braverman: “Noise versus computational intractability in dynamics“, and
- second part of Bernard Chazelle’s opening remarks.
We can think of physicists working on statistical mechanics as the first agent-based modellers. Lebowitz sketched how mathematical physics are still continuing their quest (at least for some systems) to rigorously show that macrodynamics fatefully reproduce the aggregate behavior of the microstates. Moore showed that when it comes to computational systems, the tools of statistical mechanics are inadequate to distinguish drastically different problems like 3SAT and 3XOR. Along with Braverman, he considered fundamental obstacles to prediction such as the system being complete for its complexity class, universal, or chaotic. Thankfully, Braverman and Chazelle discussed that the obstacle of universality is often a knife-edge phenomena, and Moore suggested that we can overcome it with more recent cstheory tools like smoothed analysis.
- Ofer Feinerman: “Fighting noise with limited resources: an ant colony perspective“, and
- Amos Korman: “Integrating theoretical algorithmic ideas in empirical biological study“
Feinerman and Korman work together to combine experiment and computer science theory in studying desert ants. Since these ants do not use pheromone trails to communicate, they must coordinate in the ant hill before heading out for foraging. As such, they are the perfect candidates for looking at distributed computing. Korman sketched a general framework we can use for combining cstheory with natural experiments, and connecting biological parameters by using information-theoretic lower-bounds from computer science. The ant experiments make for some awesome videos, and I recommend watching them all on Feinerman’s research page.
- Phil Holmes: “The neuro-mechanics of running cockroaches: How much natural detail do we need?“, and
- Eduardo Sontag: “Some invariant aspects of dynamical behavior of cell signaling pathways“, or “Transient ‘phenotypes’ for biomolecular model discrimination“
In this post, I introduced the basic distinction between models as abstractions, heuristics, and insilications. Holmes and Sontag’s work is a great example of insilications. Much like we are used to in physics, their applied math work is detailed enough to provide quantitative predictions that are directly testable against the relevant biology. Both speakers stressed the importance of symmetries and invariants for studying dynamic systems, and Sontag’s work provided a useful theorem for detecting scale-invariance of response curves for the giant systems of ODEs used to model cells. After 15 years of experience modeling how cockroaches run, Holmes stressed an important general note for all mathematical modelers:
[We] still don’t understand non-linear dynamics beyond linearizations or general symmetries … [and] it is not clear which details matters a priori.
- Luca Cardelli: “The cell cycle switch computes approximate majority“,
- Jack Lutz: “Parametrizing self-assembly“,
- Rebecca Schulman: “Universal molecular algorithms for learning and pattern formation“
This was the only series of talks that was not focused exclusively on using cstheory to understand natural sciences, but also on implementing computing or assembling technologies using insights gleaned from biology. Caredlli simulated and analyzed cellular computational circuits that — for computer scientists — define biology to show that they accurately implement the approximate majority algorithm from distributed computing. Lutz and Schulman looked even deeper inside the cell, to see how they can harness the self-assembly of DNA tiles to build simple nanoscale biological replicators. Lutz concentrated on the theory of faithful simulation of Turing Machines by DNA tiling systems. Schulman looked at how to experimentally realize these simple self-replicating systems. I was surprised to learn that DNA tile self-assembly can be run in the wet-lab!
- Ishanu Chattopadhyay: “Data smashing: Finding causal similarity in natural data sets“
This was the only purely machine learning talk of the workshop. Since the goal of many talks at the meeting was to look at how to build theory and understanding in biology, I used this post as an opportunity to turn to some philosophy of science. Is big data good for us? Or will it stifle progress in the long term? For theoria, I asked if the proper versus improper learning distinction in computational learning theory can help frame the philosophical discussion over the merit of understanding versus prediction. For practica, I sketched the ideas behind Chattonpadhyay’s approach to anomaly detection directly with streams without explicitly learning the underlying probabilistic finite state automata.
This was the most popular post of the series on reddit, twitter, and Google Plus. It became TheEGG’s most popular post with 3890 views to date. At the time of this summary, all seven previous posts together garnered around 7.8 thousand views and hopefully helped popularize the algorithmic lens among readers!
Barish, R. D., Schulman, R., Rothemund, P. W., & Winfree, E. (2009). An information-bearing seed for nucleating algorithmic self-assembly. Proceedings of the National Academy of Sciences, 106(15), 6054-6059.
Cardelli L, & Csikász-Nagy A (2012). The cell cycle switch computes approximate majority. Scientific Reports, 2
Chattopadhyay, I., Wen, Y., & Ray, A. (2010). Pattern Classification In Symbolic Streams via Semantic Annihilation of Information. American Control Conference arXiv: 1008.3667v1
Chazelle, B. (2012). Natural algorithms and influence systems. Communications of the ACM, 55 (12).
Doty, D., Lutz, J. H., Patitz, M. J., Schweller, R. T., Summers, S. M., & Woods, D. (2012). The tile assembly model is intrinsically universal. In Foundations of Computer Science (FOCS), 2012 IEEE 53rd Annual Symposium on (pp. 302-310). IEEE.
Feinerman, O., & Korman, A. (2012). Memory Lower Bounds for Randomized Collaborative Search and Implications to Biology. 26th International Symposium on Distributed Computing (DISC)
Pais, D., & Leonard, N. (2013). Adaptive network dynamics and evolution of leadership in collective migration Physica D: Nonlinear Phenomena DOI: 10.1016/j.physd.2013.04.014
Schulman, R., & Winfree, E. (2005). Self-replication and evolution of DNA crystals. In Advances in Artificial Life (pp. 734-743). Springer Berlin Heidelberg.
Shoval O, Goentoro L, Hart Y, Mayo A, Sontag E, & Alon U (2010). Fold-change detection and scalar symmetry of sensory input fields. Proceedings of the National Academy of Sciences of the United States of America, 107(36): 15995-6000.
Valiant, L.G. (2009) Evolvability. Journal of the ACM 56(1): 3.