## Idealization vs abstraction for mathematical models of evolution

This week I was in Turku, Finland for the annual congress of the European Society for Evolutionary Biology. I presented in the symposium on mathematical models in evolutionary biology organized by Guy Cooper, Matishalin Patel, Tom Scott, and Asher Leeks. It was a fun. It was also a big challenge given the short ten minute format. I decided to use my ten minutes to try to convince the audience that we should consider not just idealized models but also abstractions. So after my typical introduction of computational vs algorithmic biology, I switched to talking about triangles. If you would like, dear reader, then you can watch the whole session online (or grab my slides as pdf). In this post, I just want to focus on the distinction between idealized vs. abstract models.

Just as in my ESEB talk, I’ll use triangles to explain the distinction between idealized vs. abstract models.

## Generating random power-law graphs

‘Power-law’ is one of the biggest buzzwords in complexology. Almost everything is a power-law. I’ve even used it to sell my own work. But most work that deals in power-laws tends to lack rigour. And just establishing that something is a power-law shouldn’t make us feel that it is more connected to something else that is a power-law. Cosma Shalizi — the great critic of sloppy thinking in complexology — has an insightful passage on power-laws:

[T]here turn out to be nine and sixty ways of constructing power laws, and every single one of them is right, in that it does indeed produce a power law. Power laws turn out to result from a kind of central limit theorem for multiplicative growth processes, an observation which apparently dates back to Herbert Simon, and which has been rediscovered by a number of physicists (for instance, Sornette). Reed and Hughes have established an even more deflating explanation (see below). Now, just because these simple mechanisms exist, doesn’t mean they explain any particular case, but it does mean that you can’t legitimately argue “My favorite mechanism produces a power law; there is a power law here; it is very unlikely there would be a power law if my mechanism were not at work; therefore, it is reasonable to believe my mechanism is at work here.” (Deborah Mayo would say that finding a power law does not constitute a severe test of your hypothesis.) You need to do “differential diagnosis”, by identifying other, non-power-law consequences of your mechanism, which other possible explanations don’t share. This, we hardly ever do.

The curse of this multiple-realizability comes up especially when power-laws intersect with the other great field of complexology: networks.

I used to be very interested in this intersection. I was especially excited about evolutionary games on networks. But I was worried about some of the arbitrary seeming approaches in the literature to generating random power-law graphs. So before starting any projects with them, I took a look into my options. Unfortunately, I didn’t go further with the exploration.

Recently, Raoul Wadhwa has gone much more in-depth in his thinking about graphs and networks. So I thought I’d share some of my old notes on generating random power-law graphs in the hope that they might be useful to Raoul. These notes are half-baked and outdated, but maybe still fun.

Hopefully, you will find them entertaining, too, dear reader.

## Closing the gap between quantum and deterministic query complexity for easy to certify total functions

Recently, trying to keep with my weekly post schedule, I’ve been a bit strapped for inspiration. As such, I’ve posted a few times on a major topic from my past life: quantum query complexity. I’ve mostly tried to describe some techniques for (lower) bounding query complexity like the negative adversary method and span programs. But I’ve never really showed how to use these methods to actually set up interesting bounds.

Since I am again short of a post, I thought I’d share this week a simple proof of a bound possible with these techniques. This is based on an old note I wrote on 19 April 2011.

One of the big conjectures in quantum query complexity — at least a half decade ago when I was worrying about this topic — is that quantum queries give you at most a quadratic speedup over deterministic queries for total functions. In symbols: $D(f) = O(Q^2(f))$. Since Grover’s algorithm can give us a quadratic quantum speed-up for arbitrary total functions, this conjecture basically says: you can’t do better than Grover.

In this post, I’ll prove a baby version of this conjecture.

Let’s call a Boolean total-function easy to certify if one side of the function has a constant-length certificate complexity. I’ll prove that for easy-to-certify total functions, $D(f) = O(Q^2(f))$.

This is not an important result, but I thought it is a cute illustration of standard techniques. And so it doesn’t get lost in my old pdf, I thought I’d finally convert it to a blog post. Think of this as a simple application of the adversary method.

## The gene-interaction networks of easy fitness landscapes

Since evolutionary fitness landscapes have been a recurrent theme on TheEGG, I want to return, yet again, to the question of finding local peaks in fitness landscapes. In particular, to the distinction between easy and hard fitness landscapes.

Roughly, in easy landscapes, we can find local peaks quickly and in hard ones, we cannot. But this is very vague. To be a little more precise, I have to borrow the notion of orders of growth from the asymptotic analysis standard in computer science. A family of landscapes indexed by a size n (usually corresponding to the number of genes in the landscape) is easy if a local fitness optimum can be found in the landscapes in time polynomial in n and hard otherwise. In the case of hard landscapes, we can’t guarantee to find a local fitness peak and thus can sometimes reason from a state of perpetual maladaptive disequilibrium.

In Kaznatcheev (2019), I introduced this distinction to biology. Since hard landscapes have more interesting properties which are more challenging to theoretical biologist’s intuitions, I focused more on this. This was read — perhaps rightly — as me advocating for the existence or ubiquity of hard landscapes. And that if hard landscapes don’t occur in nature then my distinction is pointless. But I don’t think this is the most useful reading.

It certainly would be fun if hard landscapes were a feature of nature since they give us a new way to approach certain puzzles like the maintenance of cooperation, the evolution of costly learning, or open-ended evolution. But this is an empirical question. What isn’t a question is that hard landscape are a feature of our mental and mathematical models of evolution. As such, all — or most, whatever that means — fitness landscapes being easy is still exciting for me. It means that the easy vs hard distinction can push us to refine our mental models such that if only easy landscapes occur in nature then our models should only be able to express easy landscapes.

In other words, using computational complexity to build upper-bounds arguments (that on certain classes of landscapes, local optima can be found efficiently) can be just as fun as lower-bounds arguments (that on certain classes of landscapes, evolution requires at least a super-polynomial effort to find any local fitness peak). However, apart from a brief mention of smooth landscapes, I did not stress the upper-bounds in Kaznatcheev (2019).

Now, together with David Cohen and Peter Jeavons, I’ve taken this next step — at least in the cstheory context, we still need to write on the biology. So in this post, I want to talk briefly about a biological framing of Kaznatcheev, Cohen & Jeavons (2019) and the kind of fitness landscapes that are easy for evolution.

## Hiding behind chaos and error in the double pendulum

If you want a visual intuition for just how unpredictable chaotic dynamics can be then the go-to toy model is the double pendulum. There are lots of great simulations (and some physical implementations) of the double pendulum online. Recently, /u/abraxasknister posted such a simulation on the /r/physics subreddit and quickly attracted a lot of attention.

In their simulation, /u/abraxasknister has a fixed center (block dot) that the first mass (red dot) is attached to (by an invisible rigid massless bar). The second mass (blue dot) is then attached to the first mass (also by an invisible rigid massless bar). They then release these two masses from rest at some initial height and watch what happens.

The resulting dynamics are at right.

It is certainly unpredictable and complicated. Chaotic? Most importantly, it is obviously wrong.

But because the double pendulum is a famous chaotic system, some people did not want to acknowledge that there is an obvious mistake. They wanted to hide behind chaos: they claimed that for a complex system, we cannot possibly have intuitions about how the system should behave.

In this post, I want to discuss the error of hiding behind chaos, and how the distinction between microdynamics and global properties lets us catch /u/abraxasknister’s mistake.
Read more of this post

## Span programs as a linear-algebraic representation of functions

I feel like TheEGG has been a bit monotone in the sort of theoretical computer science that I’ve been writing about recently. In part, this has been due to time constraints and the pressure of the weekly posting schedule (it has now been over a year with a post every calendar week); and in part due to my mind being too fixated on algorithmic biology.

So for this week, I want to change things up a bit. I want to discuss some of the math behind a success of cstheory applied to nature: quantum computing. It’s been six years since I blogged about quantum query complexity and the negative adversary method for lower bounding it. And it has been close to 8 years since I’ve worked on the topic.

But I did promise to write about span programs — a technique used to reason about query complexity. So in this post, I want to shift gears to quantum computing and discuss span programs. I doubt this is useful to thinking about evolution, but it never hurts to discuss a cool linear-algebraic representation of functions.

I started writing this post for the CSTheory Community Blog. Unfortunately, that blog is largely defunct. So, after 6 years, I decided to post on TheEGG instead.

Please humour me, dear reader.

## Space-time maps & tracking colony size with OpenCV in Python

One of the things that the Department of Integrated Mathematical Oncology at the Moffitt Cancer Center is doing very well, is creating an atmosphere that combines mathematics and experiment in cancer. Fellow TheEGG blogger, Robert Vander Velde is one of the new generation of cancer researchers who are combining mathematics and experiment. Since I left Tampa, I’ve had less opportunity to keep up with the work at the IMO, but occasionally I catch up on Slack.

A couple of years ago, Robert had a computer science question. One at the data analysis and visualization stage of the relationship between computer science and cancer. Given that I haven’t posted code on TheEGG in a long time, I thought I’d share some visualizations I wrote to address Robert’s question.

There are many ways to measure the size of populations in biology. Given that we use it in our game assay, I’ve written a lot about using time-lapse microscopy of evolving populations. But this isn’t the only — or most popular — approach. It is much more common to dillute populations heavily and then count colony forming units (CFUs). I’ve discussed this briefly in the context of measuring stag-hunting bacteria.

But you can also combine both approaches. And do time-lapse microscopy of the colonies as they form.

A couple of years ago, Robert Vander Velde Andriy Marusyk were working on experiments that use colony forming units (CFUs) as a measure of populations. However, they wanted to dig deeper into the heterogeneous dynamics of CFUs by tracking the formation process through time-lapsed microscopy. Robert asked me if I could help out with a bit of the computer vision, so I wrote a Python script for them to identify and track individual colonies through time. I thought that the code might be useful to others — or me in the future — so I wanted to write a quick post explaining my approach.

This post ended up trapped in the drafts box of TheEGG for a while, but I thought now is as good a time as any to share it. I don’t know where Robert’s work on this has gone since, or if the space-time visualizations I developed were of any use. Maybe he can fill us in in the comments or with a new guest post.
Read more of this post

## Game landscapes: from fitness scalars to fitness functions

My biology writing focuses heavily on fitness landscapes and evolutionary games. On the surface, these might seem fundamentally different from each other, with their only common feature being that they are both about evolution. But there are many ways that we can interconnect these two approaches.

The most popular connection is to view these models as two different extremes in terms of time-scale.

When we are looking at evolution on short time-scales, we are primarily interested which of a limited number of extant variants will take over the population or how they’ll co-exist. We can take the effort to model the interactions of the different types with each other, and we summarize these interactions as games.

But when we zoom out to longer and longer timescales, the importance of these short term dynamics diminish. And we start to worry about how new types arise and take over the population. At this timescale, the details of the type interactions are not as important and we can just focus on the first-order: fitness. What starts to matter is how fitness of nearby mutants compares to each other, so that we can reason about long-term evolutionary trajectories. We summarize this as fitness landscapes.

From this perspective, the fitness landscapes are the more foundational concept. Games are the details that only matter in the short term.

But this isn’t the only perspective we can take. In my recent contribution with Peter Jeavons to Russell Rockne’s 2019 Mathematical Oncology Roadmap, I wanted to sketch a different perspective. In this post I want to sketch this alternative perspective and discuss how ‘game landscapes’ generalize the traditional view of fitness landscapes. In this way, the post can be viewed as my third entry on progressively more general views of fitness landscapes. The previous two were on generalizing the NK-model, and replacing scalar fitness by a probability distribution.

In this post, I will take this exploration of fitness landscapes a little further and finally connect to games. Nothing profound will be said, but maybe it will give another look at a well-known object.

## Constant-sum games as a way from non-cell autonomous processes to constant tumour growth rate

A lot of thinking in cancer biology seems to be focused on cell-autonomous processes. This is the (overly) reductive view that key properties of cells, such as fitness, are intrinsic to the cells themselves and not a function of their interaction with other cells in the tumour. As far as starting points go, this is reasonable. But in many cases, we can start to go beyond this cell-autonomous starting point and consider non-cell-autonomous processes. This is when the key properties of a cell are not a function of just that cell but also its interaction partners. As an evolutionary game theorist, I am clearly partial to this view.

Recently, I was reading yet another preprint that has observed non-cell autonomous fitness in tumours. In this case, Johnson et al. (2019) spotted the Allee effect in the growth kinetics of cancer cells even at extremely low densities (seeding in vitro at <200 cells in a 1 mm^3 well). This is an interesting paper, and although not explicitly game-theoretic in its approach, I think it is worth reading for evolutionary game theorists.

Johnson et al.'s (2019) approach is not explicitly game-theoretic because they consider their in vitro populations as a monomorphic clonal line, and thus don't model interactions between types. Instead, they attribute non-cell autonomous processes to density dependence of the single type on itself. In this setting, they reasonably define the cell-autonomous null-model as constant exponential growth, i.e. $\dot{N}_T = w_TN_T$ for some constant fitness $w_T$ and total tumour size $N_T$.

It might also be tempting to use the same model to capture cell-autonomous growth in game-theoretic models. But this would be mistaken. For this is only effectively cell-autonomous at the level of the whole tumour, but could hide non-cell-autonomous fitness at the level of the different types that make up the tumour. This apparent cell-autonomous total growth will happen whenever the type interactions are described by constant-sum games.

Given the importance of constant-sum games (more famously known as zero-sum games) to the classical game theory literature, I thought that I would write a quick introductory post about this correspondence between non-cell autonomous constant-sum games and effectively cell-autonomous growth at the level of the whole tumour.

## Quick introduction: Evolutionary game assay in Python

It’s been a while since I’ve shared or discussed code on TheEGG. So to avoid always being too vague and theoretical, I want to use this post to explain how one would write some Python code to measure evolutionary games. This will be an annotated sketch of the game assay from our recent work on measuring evolutionary games in non-small cell lung cancer (Kaznatcheev et al., 2019).

The motivation for this post came about a month ago when Nathan Farrokhian was asking for some advice on how to repeat our game assay with a new experimental system. He has since done so (I think) by measuring the game between Gefitinib-sensitive and Gefitinib-resistant cell types. And I thought it would make a nice post in the quick introductions series.

Of course, the details of the system don’t matter. As long as you have an array of growth rates (call them yR and yG with corresponding errors yR_e and yG_e) and initial proportions of cell types (call them xR and xG) then you could repeat the assay. To see how to get to this array from more primitive measurements, see my old post on population dynamics from time-lapse microscopy. It also has Python code for your enjoyment.

In this post, I’ll go through the two final steps of the game assay. First, I’ll show how to fit and visualize fitness functions (Figure 3 in Kaznatcheev et al., 2019). Second, I’ll transform those fitness functions into game points and plot (Figure 4b in Kaznatcheev et al., 2019). I’ll save discussions of the non-linear game assay (see Appendix F in Kaznatcheev et al., 2019) for a future post.
Read more of this post