What can theoretical computer science offer biology?

If there is anything definitive that can be said about biology then it is that biology is messy and complicated. The stereotypical image of a biology major is in the library memorizing volumes of loosely (at best) related facts only to have them overturned in the next semester. Concepts are related only vaguely, to the point that it looks like stamp-collecting to outsiders. The only unifying theme is evolution, and even that is presented with a smorgasbord of verbal and mathematical models, with many lacking predictive power or sometimes even solid empirical foundations. This might seem like the polar opposite of a theoretical computer scientist with her strict hierarchies and logical deductions. Even the complexity zoo seems perfectly tame compared to any biological taxonomy. However, since I’ve been promoting algorithmic theories of biology and even trying my hand at applying cstheory to models of evolution (Kaznatcheev, 2013), I must think that there is some possibility for a productive partnership. So, what can theoretical computer science offer biology? CStheory can provide the abstractions and technical tools to systematize and organize biology’s heuristic models.

One of the best proponents for theoretical computer science as a way of addressing the Big Questions, instead of being confined to serving technology, has been Scott Aaronson. It is unsurprising, that his blog has offered some of the best words for describing the aid that cstheory can offer to biological understanding. In yesterday’s post on the NSA and computational complexity, Aaronson — with attribution to mathematician Greg Kuperberg — provided the following words regarding the applicability of cstheory to cryptography (these words apply directly to biology, too):

[Theoretical computer science offers] what mathematics itself has sought to do for everything since Euclid! That is, when you see an unruly mess of insights, related to each other in some tangled way, systematize and organize it. Turn the tangle into a hierarchical tree (or dag). Isolate the minimal assumptions … on which each conclusion can be based, and spell out all the logical steps needed to get from here to there—even if the steps seem obvious or boring. Any time anyone has tried to do that, it’s been easy for the natives of the unruly wilderness to laugh at the systematizing newcomers: the latter often know the terrain less well, and take ten times as long to reach conclusions that are ten times less interesting. And yet, in case after case, the clarity and rigor of the systematizing approach has eventually won out.

Although the algorithmic approach to biology might be slow at first, and will restate obvious things many times, eventually it will systematize the models. With a clear taxonomy of models, and the powerful tools from analysis of algorithms and computational complexity, biologists will be able to better navigate the assumptions they make as they model the world around them. I expect this to be especially important to models of evolution, and other fields where extensive and detailed empirical measurements are hard to achieve. A rigorous understanding of how model assumptions and conclusions relate, will allow biologists to make minimally restrictive models that agree with what has been measured without making extraneous presuppositions about the physical world (beyond the restrictions imposed by the fact that it is our minds that has to understand these theories).

This unification of fields will not be easy going. As Aaronson answered earlier to my question on the prospects of computational complexity in biology:

The central difficulty is obvious: to whatever extent your model is “real” CS theory, biologists and social scientists will probably deride it as too idealized; conversely, to whatever extent biologists and social scientists like your model, it will probably have too many “arbitrary” aspects for CS theorists to prove anything clean about it.

Let me put it this way: to connect computational complexity (as actually practiced by computational complexity theorists) to physics (as actually practiced by physicists), seems to me like digging a tunnel between England and France. Highly nontrivial, but by 1994 people finally managed to do it!

To form an equally-compelling link between computational complexity and biological evolution feels to me like digging a tunnel between England and the US. A worthy aspiration, but orders of magnitude more ambitious!

Progress will require convincing from both sides. In computer science we will need to consider more heuristic approaches, and the biologists will have to put up with our initially slow (and sometimes tangential) progress. However, if we start digging the tunnel from both ends then we can meet somewhere at the mid-Atlantic ridge.

In a broader context, we can be see this organization of thought as the application of algorithmic philosophy, extending past simple logical analysis to a serious consideration of the laws of computation. Scott Aaronson (2011) explains why philosophers should be interested in joining this approach. Hopefully with computer science, philosophy, and biology working together, we can untangle and understand the messy living world.

ResearchBlogging.orgAaronson, Scott (2011). Why Philosophers Should Care About Computational Complexity. arXiv arXiv: 1108.1791v3

Kaznatcheev, Artem (2013). Complexity of evolutionary equilibria in static fitness landscapes. arXiv arXiv: 1308.5094v1

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

15 Responses to What can theoretical computer science offer biology?

  1. SS says:

    the reason why the progress of computational biology is slow is not all because of the complexity of bio and etc…
    Nowadays, we tend to care more about publications and one piece of code will be over-used by biologist in their every system.
    Much more descriptive analysis, more correlation in big-data times…
    data is overwhelming, but information from it is shitty.

    • Theoretical computer science is not about code, or providing software to biologists. I think the algorithmic lens is about organizing the heuristic models that theoretical biologists currently use into abstract models that can built into a coherent theory. This article was shared on r/bioinformatics, and a lot of people there seemed to have interpreted my writing the same way you did, but I am not writing about computational biology (see opening of this post). Somebody on reddit expressed the focus I intended even more clearly here.

      I agree with you that there is a lot of crappy data, and Philip Gerlee captured that sentiment well.

  2. Excellent post. The quote from Scott Aaronson is a description of the old and verified cartesian method. However, I have questions: (1) are the biologists not aware about the cartesian method? (I don’t think so) (2) why don’t CS theory specialists do a lot of good to biologists by spending some time to organize their findings? (or write some programs to this effect). To me, a geometer with only recent interests in CS and biological vision, this matter appears a lot more complex. If I only look at explanations of biological vision vs computer vision, then I see that CS is basically incapable to propose anything close to what biologists (or neuroscience) are studying and conversely, that biologists play disproportionately more with statistics tools, compared with algorithmics ones. So, in conclusion, I’d say that is an effect of bad communication and lot of noise created by the mechanism described by SS comment.

  3. Pingback: What is the algorithmic lens? | Theory, Evolution, and Games Group

  4. Pingback: How teachers help us learn deterministic finite automata | Theory, Evolution, and Games Group

  5. Pingback: Bounded rationality: systematic mistakes and conflicting agents of mind | Theory, Evolution, and Games Group

  6. Pingback: Programming language for chemistry | Theory, Evolution, and Games Group

  7. Pingback: Software through the lens of evolutionary biology | Theory, Evolution, and Games Group

  8. Pingback: Quantum query complexity | Theory, Evolution, and Games Group

  9. Pingback: Sherlock Holmes and the Case of the Missing Replication | Theory, Evolution, and Games Group

  10. Pingback: Cataloging a year of blogging: the algorithmic world | Theory, Evolution, and Games Group

  11. Pingback: Misleading models: “How learning can guide evolution” | Theory, Evolution, and Games Group

  12. Pingback: Cataloging a year of blogging: cancer and biology | Theory, Evolution, and Games Group

  13. Pingback: Seeing edge effects in tumour histology | Theory, Evolution, and Games Group

  14. Pingback: Five motivations for theoretical computer science | Theory, Evolution, and Games Group

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s