## Five motivations for theoretical computer science

There are some situations, perhaps lucky ones, where it is felt that an activity needs no external motivation or justification.  For the rest, it can be helpful to think of what the task at hand can be useful for. This of course doesn’t answer the larger question of what is worth doing, since it just distributes the burden somewhere else, but establishing these connections seems like a natural part of an answer to the larger question.

Along those lines, the following are five intellectual areas for whose study theoretical computer science concepts and their development can be useful – therefore, a curiosity about these areas can provide some motivation for learning about those cstheory concepts or developing them. They are arranged from the likely more obvious to most people to the less so: technology, mathematics, science, society, and philosophy. This post could also serve as an homage to delayed gratification (perhaps with some procrastination mixed in), having been finally written up more than three years after first discussing it with Artem.

## Evolutionary non-commutativity suggests novel treatment strategies

In the Autumn of 2011 I received an email from Jacob Scott, now a good friend and better mentor, who was looking for an undergraduate to code an evolutionary simulation. Jake had just arrived in Oxford to start his DPhil in applied mathematics and by chance had dined at St Anne’s College with Peter Jeavons, then a tutor of mine, the evening before. Jake had outlined his ideas, Peter had supplied a number of email addresses, Jake sent an email and I uncharacteristically replied saying I’d give it a shot. These unlikely events would led me to where I am today — a DPhil candidate in the Oxford University Department of Computer Science. My project with Jake was a success and I was invited to speak at the 2012 meeting of the Society of Mathematical Biology in Knoxville, TN. Here I met one of Jake’s supervisors, Alexander Anderson, who invited me to visit the Department of Integrated Mathematical Oncology at the Moffitt Cancer Center and Research Institute for a workshop in December of that year. Here Dr. Anderson and I discussed one of the key issues with the work I will present in this post, issues that now form the basis of my DPhil with Dr. Anderson as one of two supervisors. Fittingly, the other is Peter Jeavons.

Jake was considering the problem of treating and avoiding drug resistance and in his short email provided his hypothesis as a single question: “Can we administer a sequence of drugs to steer the evolution of a disease population to a configuration from which resistance cannot emerge?”

In Nichol et al. (2015), we provide evidence for an affirmative answer to this question. I would like to use this post to introduce you to our result, and discuss some of the criticisms.

## Evolution is a special kind of (machine) learning

Theoretical computer science has a long history of peering through the algorithmic lens at the brain, mind, and learning. In fact, I would argue that the field was born from the epistemological questions of what can our minds learn of mathematical truth through formal proofs. The perspective became more scientific with McCullock & Pitts’ (1943) introduction of finite state machines as models of neural networks and Turing’s B-type neural networks paving the way for our modern treatment of artificial intelligence and machine learning. The connections to biology, unfortunately, are less pronounced. Turing ventured into the field with his important work on morphogenesis, and I believe that he could have contributed to the study of evolution but did not get the chance. This work was followed up with the use of computers in biology, and with heuristic ideas from evolution entering computer science in the form of genetic algorithms. However, these areas remained non-mathematical, with very few provable statements or non-heuristic reasoning. The task of making strong connections between theoretical computer science and evolutionary biology has been left to our generation.

Although the militia of cstheorists reflecting on biology is small, Leslie Valiant is their standard-bearer for the steady march of theoretical computer science into both learning and evolution. Due in part to his efforts, artificial intelligence and machine learning are such well developed fields that their theory branch has its own name and conferences: computational learning theory (CoLT). Much of CoLT rests on Valiant’s (1984) introduction of probably-approximately correct (PAC) learning which — in spite of its name — is one of the most formal and careful ways to understand learnability. The importance of this model cannot be understated, and resulted in Valiant receiving (among many other distinctions) the 2010 Turing award (i.e. the Nobel prize of computer science). Most importantly, his attention was not confined only to pure cstheory, he took his algorithmic insights into biology, specifically computational neuroscience (see Valiant (1994; 2006) for examples), to understand human thought and learning.

Like any good thinker reflecting on biology, Valiant understands the importance of Dobzhansky’s observation that “nothing in biology makes sense except in the light of evolution”. Even for the algorithmic lens it helps to have this illumination. Any understanding of learning mechanisms like the brain is incomplete without an examination of the evolutionary dynamics that shaped these organs. In the mid-2000s, Valiant embarked on the quest of formalizing some of the insights cstheory can offer evolution, culminating in his PAC-based model of evolvability (Valiant, 2009). Although this paper is one of the most frequently cited on TheEGG, I’ve waited until today to give it a dedicated post.

## Mathematics in finance and hiding lies in complexity

Sir Andrew Wiles

Mathematics has a deep and rich history, extending well beyond the 16th century start of the scientific revolution. Much like literature, mathematics has a timeless quality; although its trends wax and wane, no part of it becomes out-dated or wrong. What Diophantus of Alexandria wrote on solving algebraic equations in the 3rd century was still as true in the 16th, 17th, or today. In fact, it was in 1637 in the margins of Diophantus’ Arithmetica that Pierre de Fermat scribbled the statement of his Last Theorem. that the margin was too narrow to contain[1]. In modern notation it is probably one of the most famous Diophantine equations $a^n + b^n = c^n$ with the assertion that it has no solutions for $n > 2$ and $a,b,c$ as positive integers. A statement that almost anybody can understand, but one that is far from easy to prove or even approach[2].

## Programming language for biochemistry

Computer scientists that think of nature as literally computing, often take the stance that biological organisms are nothing more than protein interaction networks. For example, this is the stance that Leslie Valiant (2009) takes when defining ecorithms: biology is just a specialization of computer science focused on evolvable circuits. User @exploderator summarized the realist computational view of biology on Reddit while answering what theoretical computer science can offer biology:

[B]iology is primarily chemo-computation, chemical information systems and computational hardware.
Theoretical comp sci is the only field that is actually specifically dedicated to studying the mathematics / logic of computation. Therefore, although biology is an incredibly hard programming problem (only a fool thinks nature simple), it is indeed more about programming and less about the hardware it’s running on.

Although it is an easy stance for a theoretician to take, it is a little bit more involved for a molecular biologist, chemist, or engineer. Yet for the last 30 years, even experimentalists have been captivated by this computational realism and promise of engineering molecular devices (Drexler, 1981). Half a year ago, I even reviewed Bonnet et al. (2013) taking steps towards building transcriptors. They are focusing on the hardware side of biological computation and building a DNA-analogue of the von Neumann architecture. However, what we really need is a level of abstraction: a chemical programming language that can be compiled into biocompatible reactions.

## Semi-smooth fitness landscapes and the simplex algorithm

As you might have guessed from my strange and complicated name, I’m Russian. One of the weird features of this is that even though I have never had to experience war, I still feel a strong cultural war-weariness. This stems from an ancestoral memory of the Second World War, a conflict that had an extremely disruptive affect on Russian society. None of my great-grandfathers survived the war; one of them was a train engineer that died trying to drive a train of provisions over the Road of Life to resuply Leningrad during its 29 month seige. Since the Germans blocked all the land routes, part of road ran over the ice on Lake Ladoga — trucks had to be optimally spaced to not crack the ice that separated them from a watery grave while maximizing the amount of supplies transported into the city. Leonid Kantorovich — the Russian mathematician and economist that developed linear programming as the war was starting in western Europe — ensured safety by calculating the optimal distance between cars depending on the ice thickness and air temperature. In the first winter of the road, Kantorovich would personally walk between trucks on the ice to ensure his guidelines were followed and to reassure the men of the reliability of mathematical programming. Like his British counterpart, Kantorovich was aplying the algorithmic lens to help the Allied war effort and the safety of his people. Although I can never reciprocate the heroism of these great men, stories like this convince me that the algorithmic lens can provide a powerful perspective in economics, engineering, or science. This is one of the many inspirations behind my most recent project (Kaznatcheev, 2013) applying the tools of theoretical computer science and mathematical optimization — such as linear programming — to better understand the rate of evolutionary dynamics.

## What can theoretical computer science offer biology?

If there is anything definitive that can be said about biology then it is that biology is messy and complicated. The stereotypical image of a biology major is in the library memorizing volumes of loosely (at best) related facts only to have them overturned in the next semester. Concepts are related only vaguely, to the point that it looks like stamp-collecting to outsiders. The only unifying theme is evolution, and even that is presented with a smorgasbord of verbal and mathematical models, with many lacking predictive power or sometimes even solid empirical foundations. This might seem like the polar opposite of a theoretical computer scientist with her strict hierarchies and logical deductions. Even the complexity zoo seems perfectly tame compared to any biological taxonomy. However, since I’ve been promoting algorithmic theories of biology and even trying my hand at applying cstheory to models of evolution (Kaznatcheev, 2013), I must think that there is some possibility for a productive partnership. So, what can theoretical computer science offer biology? CStheory can provide the abstractions and technical tools to systematize and organize biology’s heuristic models.

## Computational complexity of evolutionary equilibria

The first half of the 20th century is famous for revolutionary changes — paradigm shifts as Kuhn would say — across the sciences and other systems of thought. Disheartened by the scars of the First World War, European thinkers sought refuge by shifting their worldviews away from those of their fathers. In fields like biology, this meant looking for guidance to your grandfathers, instead. The founders of the modern synthesis reconciled the fading ideas of Wallace’s natural selection with Mendelian genetics. In the process, they unified many branches of biology, that at the dawn of the 20th century had been diverging, into a single paradigm that persists today. A return to evolution by natural selection illuminated their field and ended the eclipse of Darwinism. At the same time, mathematicians questioned Hilbert’s formalism and Russell’s logicism as valid foundations for their own field. As a result, they formalized mechanistic calculation and logical thought as computation and founded theoretical computer science to study its power and limitations. Even though some pioneers — like Alan Turing — kept an eye on biology, the two streams of thought did not converge. The coming of the Second World War pushed both fields away from each other and deep foundational questions, entrenching them in applied and technological work.

For the rest of the 20th century, the fields remained largely independent. Computer science only looked to biology for vague inspiration of heuristics in the form of evolutionary computing (Holland, 1975), and biologists looked at computer science as an engineering or technical field that could only provide them with applied tools like bioinformatics. Neither field saw in the other a partner for addressing deep theoretical questions. As I mentioned previously, my goal is to remedy this by applying ideas from analysis of algorithms and computational complexity to fundamental questions in biology. Sometime in late October, I tweeted my joy at seeing evolution clearly through the algorithmic lens. On Friday the 23rd, after much delay, I turned this into an ArXiv preprint ambitiously titled “Complexity of evolutionary equilibria in static fitness landscapes“. The paper combines the discussion of evolution from my previous few blog posts with three simple technical results. Although it was written for theoretical computer scientists, I tried to make it accessible to mathematical biologists as well; the hope is to serve as a launching point for discussion between the two disciplines.

## NK and block models of fitness landscapes

On February 12, 2001, the human genome project released its first formal report. It was a great day for biology, and a wonderful birthday present for Darwin. However, it was also a humbling confrontation with complexity and prompted Steven Jay Gould to write in the New York Times:

[The Human Genome project revealed that] Home sapiens possesses between 30,000 and 40,000 … [only twice as many as the tiny roundworm] … The key to complexity is not more genes, but more combinations and interactions generated by fewer units of code … [their function] cannot be predicted from the separate underlying parts alone.

The one gene to one protein paradigm was overthrown, the 40k genes were simply not enough to generate the over 140k identified proteins in the human body. For evolutionary biology, the new view meant that changing one gene could affect (and would affect, in most cases) many proteins and thus have several different contributions to fitness. Biologists needed a model of fitness landscapes where the genes could interact with each other, allow for varying combinations, and manifest epistasis.