## Introduction to Algorithmic Biology: Evolution as Algorithm

As Aaron Roth wrote on Twitter — and as I bet with my career: “Rigorously understanding evolution as a computational process will be one of the most important problems in theoretical biology in the next century. The basics of evolution are many students’ first exposure to “computational thinking” — but we need to finish the thought!”

Last week, I tried to continue this thought for Oxford students at a joint meeting of the Computational Society and Biological Society. On May 22, I gave a talk on algorithmic biology. I want to use this post to share my (shortened) slides as a pdf file and give a brief overview of the talk.

If you didn’t get a chance to attend, maybe the title and abstract will get you reading further:

Algorithmic Biology: Evolution is an algorithm; let us analyze it like one.

Evolutionary biology and theoretical computer science are fundamentally interconnected. In the work of Charles Darwin and Alfred Russel Wallace, we can see the emergence of concepts that theoretical computer scientists would later hold as central to their discipline. Ideas like asymptotic analysis, the role of algorithms in nature, distributed computation, and analogy from man-made to natural control processes. By recognizing evolution as an algorithm, we can continue to apply the mathematical tools of computer science to solve biological puzzles – to build an algorithmic biology.

One of these puzzles is open-ended evolution: why do populations continue to adapt instead of getting stuck at local fitness optima? Or alternatively: what constraint prevents evolution from finding a local fitness peak? Many solutions have been proposed to this puzzle, with most being proximal – i.e. depending on the details of the particular population structure. But computational complexity provides an ultimate constraint on evolution. I will discuss this constraint, and the positive aspects of the resultant perpetual maladaptive disequilibrium. In particular, I will explain how we can use this to understand both on-going long-term evolution experiments in bacteria; and the evolution of costly learning and cooperation in populations of complex organisms like humans.

Unsurprisingly, I’ve writen about all these topics already on TheEGG, and so my overview of the talk will involve a lot of links back to previous posts. In this way. this can serve as an analytic linkdex on algorithmic biology.

## British agricultural revolution gave us evolution by natural selection

This Wednesday, I gave a talk on algorithmic biology to the Oxford Computing Society. One of my goals was to show how seemingly technology oriented disciplines (such as computer science) can produce foundational theoretical, philosophical and scientific insights. So I started the talk with the relationship between domestication and natural selection. Something that I’ve briefly discussed on TheEGG in the past.

Today we might discuss artificial selection or domestication (or even evolutionary oncology) as applying the principles of natural selection to achieve human goals. This is only because we now take Darwin’s work as given. At the time that he was writing, however, Darwin actually had to make his argument in the other direction. Darwin’s argument proceeds from looking at the selection algorithms used by humans and then abstracting it to focus only on the algorithm and not the agent carrying out the algorithm. Having made this abstraction, he can implement the breeder by the distributed struggle for existence and thus get natural selection.

The inspiration is clearly from the technological to the theoretical. But there is a problem with my story.

Domestication of plants and animals in ancient. Old enough that we have cancers that arose in our domesticated helpers 11,000 years ago and persist to this day. Domestication in general — the fruit of the first agricultural revolution — can hardly qualify as a new technology in Darwin’s day. It would have been just as known to Aristotle, and yet he thought species were eternal.

Why wasn’t Aristotle or any other ancient philosopher inspired by the agriculture and animal husbandry of their day to arrive at the same theory as Darwin?

The ancients didn’t arrive at the same view because it wasn’t the domestication of the first agricultural revolution that inspired Darwin. It was something much more contemporary to him. Darwin was inspired by the British agricultural revolution of the 18th and early 19th century.

In this post, I want to sketch this connection between the technological development of the Georgian era and the theoretical breakthroughs in natural science in the subsequent Victorian era. As before, I’ll focus on evolution and algorithm.

## Space-time maps & tracking colony size with OpenCV in Python

One of the things that the Department of Integrated Mathematical Oncology at the Moffitt Cancer Center is doing very well, is creating an atmosphere that combines mathematics and experiment in cancer. Fellow TheEGG blogger, Robert Vander Velde is one of the new generation of cancer researchers who are combining mathematics and experiment. Since I left Tampa, I’ve had less opportunity to keep up with the work at the IMO, but occasionally I catch up on Slack.

A couple of years ago, Robert had a computer science question. One at the data analysis and visualization stage of the relationship between computer science and cancer. Given that I haven’t posted code on TheEGG in a long time, I thought I’d share some visualizations I wrote to address Robert’s question.

There are many ways to measure the size of populations in biology. Given that we use it in our game assay, I’ve written a lot about using time-lapse microscopy of evolving populations. But this isn’t the only — or most popular — approach. It is much more common to dillute populations heavily and then count colony forming units (CFUs). I’ve discussed this briefly in the context of measuring stag-hunting bacteria.

But you can also combine both approaches. And do time-lapse microscopy of the colonies as they form.

A couple of years ago, Robert Vander Velde Andriy Marusyk were working on experiments that use colony forming units (CFUs) as a measure of populations. However, they wanted to dig deeper into the heterogeneous dynamics of CFUs by tracking the formation process through time-lapsed microscopy. Robert asked me if I could help out with a bit of the computer vision, so I wrote a Python script for them to identify and track individual colonies through time. I thought that the code might be useful to others — or me in the future — so I wanted to write a quick post explaining my approach.

This post ended up trapped in the drafts box of TheEGG for a while, but I thought now is as good a time as any to share it. I don’t know where Robert’s work on this has gone since, or if the space-time visualizations I developed were of any use. Maybe he can fill us in in the comments or with a new guest post.

## Coarse-graining vs abstraction and building theory without a grounding

Back in September 2017, Sandy Anderson was tweeting about the mathematical oncology revolution. To which Noel Aherne replied with a thorny observation that “we have been curing cancers for decades with radiation without a full understanding of all the mechanisms”.

This lead to a wide-ranging discussion and clarification of what is meant by terms like mechanism. I had meant to blog about these conversations when they were happening, but the post fell through the cracks and into the long to-write list.

This week, to continue celebrating Rockne et al.’s 2019 Mathematical Oncology Roadmap, I want to revisit this thread.

And not just in cancer. Although my starting example will focus on VEGF and cancer.

I want to focus on a particular point that came up in my discussion with Paul Macklin: what is the difference between coarse-graining and abstraction? In the process, I will argue that if we want to build mechanistic models, we should aim not after explaining new unknown effects but rather focus on effects where we already have great predictive power from simple effective models.

Since Paul and I often have useful disagreements on twitter, hopefully writing about it on TheEGG will also prove useful.

## Game landscapes: from fitness scalars to fitness functions

My biology writing focuses heavily on fitness landscapes and evolutionary games. On the surface, these might seem fundamentally different from each other, with their only common feature being that they are both about evolution. But there are many ways that we can interconnect these two approaches.

The most popular connection is to view these models as two different extremes in terms of time-scale.

When we are looking at evolution on short time-scales, we are primarily interested which of a limited number of extant variants will take over the population or how they’ll co-exist. We can take the effort to model the interactions of the different types with each other, and we summarize these interactions as games.

But when we zoom out to longer and longer timescales, the importance of these short term dynamics diminish. And we start to worry about how new types arise and take over the population. At this timescale, the details of the type interactions are not as important and we can just focus on the first-order: fitness. What starts to matter is how fitness of nearby mutants compares to each other, so that we can reason about long-term evolutionary trajectories. We summarize this as fitness landscapes.

From this perspective, the fitness landscapes are the more foundational concept. Games are the details that only matter in the short term.

But this isn’t the only perspective we can take. In my recent contribution with Peter Jeavons to Russell Rockne’s 2019 Mathematical Oncology Roadmap, I wanted to sketch a different perspective. In this post I want to sketch this alternative perspective and discuss how ‘game landscapes’ generalize the traditional view of fitness landscapes. In this way, the post can be viewed as my third entry on progressively more general views of fitness landscapes. The previous two were on generalizing the NK-model, and replacing scalar fitness by a probability distribution.

In this post, I will take this exploration of fitness landscapes a little further and finally connect to games. Nothing profound will be said, but maybe it will give another look at a well-known object.

## Colour, psychophysics, and the scientific vs. manifest image of reality

Recently on TheEGG, I’ve been writing a lot about the differences between effective (or phenomenological) and reductive theories. Usually, I’ve confined this writing to evolutionary biology; especially the tension between effective and reductive theories in the biology of microscopic systems. For why this matters to evolutionary game theory, see Kaznatcheev (2017, 2018).

But I don’t think that microscopic systems are the funnest place to see this interplay. The funnest place to see this is in psychology.

In the context of psychology, you can add an extra philosophical twist. Instead of differentiating between reductive and effective theories; a more drastic difference can be drawn between the scientific and manifest image of reality.

In this post, I want to briefly talk about how our modern theories of colour vision developed. This is a nice example of good effective theory leading before any reductive basis. And with that background in mind, I want to ask the question: are colours real? Maybe this will let me connect to some of my old work on interface theories of perception (see Kaznatcheev, Montrey, and Shultz, 2014).

## Constant-sum games as a way from non-cell autonomous processes to constant tumour growth rate

A lot of thinking in cancer biology seems to be focused on cell-autonomous processes. This is the (overly) reductive view that key properties of cells, such as fitness, are intrinsic to the cells themselves and not a function of their interaction with other cells in the tumour. As far as starting points go, this is reasonable. But in many cases, we can start to go beyond this cell-autonomous starting point and consider non-cell-autonomous processes. This is when the key properties of a cell are not a function of just that cell but also its interaction partners. As an evolutionary game theorist, I am clearly partial to this view.

Recently, I was reading yet another preprint that has observed non-cell autonomous fitness in tumours. In this case, Johnson et al. (2019) spotted the Allee effect in the growth kinetics of cancer cells even at extremely low densities (seeding in vitro at <200 cells in a 1 mm^3 well). This is an interesting paper, and although not explicitly game-theoretic in its approach, I think it is worth reading for evolutionary game theorists.

Johnson et al.'s (2019) approach is not explicitly game-theoretic because they consider their in vitro populations as a monomorphic clonal line, and thus don't model interactions between types. Instead, they attribute non-cell autonomous processes to density dependence of the single type on itself. In this setting, they reasonably define the cell-autonomous null-model as constant exponential growth, i.e. $\dot{N}_T = w_TN_T$ for some constant fitness $w_T$ and total tumour size $N_T$.

It might also be tempting to use the same model to capture cell-autonomous growth in game-theoretic models. But this would be mistaken. For this is only effectively cell-autonomous at the level of the whole tumour, but could hide non-cell-autonomous fitness at the level of the different types that make up the tumour. This apparent cell-autonomous total growth will happen whenever the type interactions are described by constant-sum games.

Given the importance of constant-sum games (more famously known as zero-sum games) to the classical game theory literature, I thought that I would write a quick introductory post about this correspondence between non-cell autonomous constant-sum games and effectively cell-autonomous growth at the level of the whole tumour.

## Quick introduction: the algorithmic lens

Computers are a ubiquitous tool in modern research. We use them for everything from running simulation experiments and controlling physical experiments to analyzing and visualizing data. For almost any field ‘X’ there is probably a subfield of ‘computational X’ that uses and refines these computational tools to further research in X. This is very important work and I think it should be an integral part of all modern research.

But this is not the algorithmic lens.

In this post, I will try to give a very brief description (or maybe just a set of pointers) for the algorithmic lens. And of what we should imagine when we see an ‘algorithmic X’ subfield of some field X.

## From perpetual motion machines to the Entscheidungsproblem

There seems to be a tendency to use the newest technology of the day as a metaphor for making sense of our hardest scientific questions. These metaphors are often vague and inprecise. They tend to overly simplify the scientific question and also misrepresent the technology. This isn’t useful.

But the pull of this metaphor also tends to transform the technical disciplines that analyze our newest tech into fundamental disciplines that analyze our universe. This was the case for many aspects of physics, and I think it is currently happening with aspects of theoretical computer science. This is very useful.

So, let’s go back in time to the birth of modern machines. To the water wheel and the steam engine.

I will briefly sketch how the science of steam engines developed and how it dealt with perpetual motion machines. From here, we can jump to the analytic engine and the modern computer. I’ll suggest that the development of computer science has followed a similar path — with the Entscheidungsproblem and its variants serving as our perpetual motion machine.

The science of steam engines successfully universalized itself into thermodynamics and statistical mechanics. These are seen as universal disciplines that are used to inform our understanding across the sciences. Similarly, I think that we need to universalize theoretical computer science and make its techniques more common throughout the sciences.

## Fitness distributions versus fitness as a summary statistic: algorithmic Darwinism and supply-driven evolution

For simplicity, especially in the fitness landscape literature, fitness is often treated as a scalar — usually a real number. If our fitness landscape is on genotypes then each genotype has an associated scalar value of fitness. If our fitness landscape is on phenotypes then each phenotype has an associated scalar value of fitness.

But this is a little strange. After all, two organisms with the same genotype or phenotype don’t necessarily have the same number of offspring or other life outcomes. As such, we’re usually meant to interpret the value of fitness as the mean of some random variable like number of children. But is the mean the right summary statistic to use? And if it is then which mean: arithmetic or geometric or some other?

One way around this is to simply not use a summary statistic, and instead treat fitness as a random variable with a corresponding distribution. For many developmental biologists, this would still be a simplification since it ignores many other aspects of life-histories — especially related to reproductive timing. But it is certainly an interesting starting point. And one that I don’t see pursued enough in the fitness landscape literature.

The downside is that it makes an already pretty vague and unwieldy model — i.e. the fitness landscape — even less precise and even more unwieldy. As such, we should pursue this generalization only if it brings us something concrete and useful. In this post I want to discuss two aspects of this: better integration of evolution with computational learning theory and thinking about supply driven evolution (i.e. arrival of the fittest). In the process, I’ll be drawing heavily on the thoughts of Leslie Valiant and Julian Z. Xue.