# Minimal models for explaining unbounded increase in fitness

October 27, 2018
by Artem Kaznatcheev

On a prior version of my paper on computational complexity as an ultimate constraint, Hemachander Subramanian made a good comment and question:

Nice analysis Artem! If we think of the fitness as a function of genes, interactions between two genes, and interactions between three genes and so on, your analysis using epistasis takes into account only the interactions (second order and more). The presence or absence of the genes themselves (first order) can change the landscape itself, though. Evolution might be able to play the game of standing still as the landscape around it changes until a species is “stabilized” by finding itself in a peak. The question is would traversing these time-dependent landscapes for optima is still uncomputable?

And although I responded to his comment in the bioRxiv Disqus thread, it seems that comments are version locked and so you cannot see Hema’s comment anymore on the newest version. As such, I wanted to share my response on the blog and expand a bit on it.

Mostly this will be an incomplete argument for why biologists should care about worst-case analysis. I’ll have to expand on it more in the future.

For me, one of the beauties of computer science is showing rigorously that a very `simple’ model has some surprising hard feature. Then any more complicated model that contains the simple model as a ‘submodel’ will also be hard. This is part of the reason why I pick finite static landscapes and use worst case analysis.

Let’s start with worst-case analysis.

The reasons for using worst-case analysis are many. But to focus on the most relevant here: ease of transfer. Doing the first worst case analysis of fitness landscapes might be more difficult than doing some of the random analysis done before — probably one of the reasons that I could still do it 30 years after the relevant models were developed. But once you’ve done this analysis, its results transfer to more expressive models. A more expressive model will be able to express the simple model as one of its cases. Thus, its worst case will be at least as bad as the simple model’s worst case.

This straightforward stacking of results does not work for average case analysis. Since the simple model is usually a tiny fraction of the models realizable by a more expressive model, the simple model’s average behaviour will have little effect on the average behaviour of the more expressive model. This is a two pronged observation.

On the first prong: suppose we have a good reason (preferably one that is independent of the model in question) to believe in a particular distribution over the objects we’re studying. In other words, we’re using average case analysis not because it is easy but because we know the distribution. In that case, if the simpler model is a tiny fraction of the distribution then knowing its properties is not necessarily useful to us. This is what people usually like to rule as special-case or knife-edge effects. In this case, worst case analysis is not helpful.

But there is the second prong: suppose we don’t have a good reason to believe in a particular distribution over the objects in question. This is the case if we’re using average case analysis because it’s easy or because we think it’s “assumption free”. Of course, no distribution is assumption free. In that case, it is better to know the logical properties of the model description and separate them from effects due to the distribution over objects. This is where worst-case analysis shines. And I believe that this is also the relevant case for fitness landscapes.

Worst-case analysis lets us stack together results much better than average case analysis. Even though the simple model is a tiny fraction of what is expressable by the more complicated model — it is still expressable. Thus, it’s difficult aspects are still expressible. That is why theoretical computer scientists have built up such a rich network of complexity results and reductions. Of course, in the more complex model it might be even easier to prove hardness results directly, since you have more freedom, and some hardness results might become trivial (as we’ll see below).

I choose finite static fitness landscapes because I think many would consider them to be a very simple model. And more importantly, models of dynamic landscapes often have finite static landscapes as a special case and so hardness results transfer. In particular, I don’t need to believe that finite static fitness landscapes are a perfect or adequate model of evolution. Instead, I just need to believe that any adequate model of evolution will be able to express finite static fitness landscapes as a submodel.

So let’s see some examples.

More specifically to Hema’s point: there are several ways you can incorporate the absence of genes into my work. The most straightforward way is to interpret a “0” at a given position in the “genome” as marking the absence of a gene. In that case, if you have a bound on the total “possible” number of genes then the analysis remains unchanged (just your initial genotype is padded with zeros).

If the number of possible genes is unbounded — which might very well be a realistic assumption — then you aren’t dealing with a finite landscape. In particular, you will need a theory for how new genes are introduced and how they interconnect to existing genes, but this will inherently be a more complicated theory, so it’ll be less surprising that it allows for complex dynamics.

In particular, on an infinite landscape, there are simpler ways than combinatorial complexity to get unreachable optima: simple postulate that there is always a chance for a new gene to arise in a smooth fitness landscape. In this approach you wouldn’t even need any combinatorial structure, you could just have an infinite tree where a new branch is always possible. This is kind of what existing models of unbounded evolution do. I’ve previously described this family of models as Orr-Gillespie theory.

I find this a bit unsatisfying, however. It doesn’t so much explain unbounded evolution as assume it.

However, Richard Lenski has pointed out that these sort of models can be a good fit for moderate evolutionary time scales (which is what they aim to explain, with the LTEE in particular), where the number of new mutations is smaller the number of genes (or where the number of generation is on the order of the number of genes). In such a linear regime, I don’t think you’d see much difference between some hard finite landscape and these infinite tree landscapes.

But it is important that these infinite ‘landscapes’ lack the combinatorial structure that was central to fitness landscapes. So we probably shouldn’t call the two by the same name.

## Minimal models for explaining unbounded increase in fitness

October 27, 2018 by Artem Kaznatcheev Leave a comment

On a prior version of my paper on computational complexity as an ultimate constraint, Hemachander Subramanian made a good comment and question:

And although I responded to his comment in the bioRxiv Disqus thread, it seems that comments are version locked and so you cannot see Hema’s comment anymore on the newest version. As such, I wanted to share my response on the blog and expand a bit on it.

Mostly this will be an incomplete argument for why biologists should care about worst-case analysis. I’ll have to expand on it more in the future.

For me, one of the beauties of computer science is showing rigorously that a very `simple’ model has some surprising hard feature. Then any more complicated model that contains the simple model as a ‘submodel’ will also be hard. This is part of the reason why I pick finite static landscapes and use worst case analysis.

Let’s start with worst-case analysis.

The reasons for using worst-case analysis are many. But to focus on the most relevant here: ease of transfer. Doing the first worst case analysis of fitness landscapes might be more difficult than doing some of the random analysis done before — probably one of the reasons that I could still do it 30 years after the relevant models were developed. But once you’ve done this analysis, its results transfer to more expressive models. A more expressive model will be able to express the simple model as one of its cases. Thus, its worst case will be at least as bad as the simple model’s worst case.

This straightforward stacking of results does not work for average case analysis. Since the simple model is usually a tiny fraction of the models realizable by a more expressive model, the simple model’s average behaviour will have little effect on the average behaviour of the more expressive model. This is a two pronged observation.

On the first prong: suppose we have a good reason (preferably one that is independent of the model in question) to believe in a particular distribution over the objects we’re studying. In other words, we’re using average case analysis not because it is easy but because we know the distribution. In that case, if the simpler model is a tiny fraction of the distribution then knowing its properties is not necessarily useful to us. This is what people usually like to rule as special-case or knife-edge effects. In this case, worst case analysis is not helpful.

But there is the second prong: suppose we don’t have a good reason to believe in a particular distribution over the objects in question. This is the case if we’re using average case analysis because it’s easy or because we think it’s “assumption free”. Of course, no distribution is assumption free. In that case, it is better to know the logical properties of the model description and separate them from effects due to the distribution over objects. This is where worst-case analysis shines. And I believe that this is also the relevant case for fitness landscapes.

Worst-case analysis lets us stack together results much better than average case analysis. Even though the simple model is a tiny fraction of what is expressable by the more complicated model — it is still expressable. Thus, it’s difficult aspects are still expressible. That is why theoretical computer scientists have built up such a rich network of complexity results and reductions. Of course, in the more complex model it might be even easier to prove hardness results directly, since you have more freedom, and some hardness results might become trivial (as we’ll see below).

I choose finite static fitness landscapes because I think many would consider them to be a very simple model. And more importantly, models of dynamic landscapes often have finite static landscapes as a special case and so hardness results transfer. In particular, I don’t need to believe that finite static fitness landscapes are a perfect or adequate model of evolution. Instead, I just need to believe that any adequate model of evolution will be able to express finite static fitness landscapes as a submodel.

So let’s see some examples.

More specifically to Hema’s point: there are several ways you can incorporate the absence of genes into my work. The most straightforward way is to interpret a “0” at a given position in the “genome” as marking the absence of a gene. In that case, if you have a bound on the total “possible” number of genes then the analysis remains unchanged (just your initial genotype is padded with zeros).

If the number of possible genes is unbounded — which might very well be a realistic assumption — then you aren’t dealing with a finite landscape. In particular, you will need a theory for how new genes are introduced and how they interconnect to existing genes, but this will inherently be a more complicated theory, so it’ll be less surprising that it allows for complex dynamics.

In particular, on an infinite landscape, there are simpler ways than combinatorial complexity to get unreachable optima: simple postulate that there is always a chance for a new gene to arise in a smooth fitness landscape. In this approach you wouldn’t even need any combinatorial structure, you could just have an infinite tree where a new branch is always possible. This is kind of what existing models of unbounded evolution do. I’ve previously described this family of models as Orr-Gillespie theory.

I find this a bit unsatisfying, however. It doesn’t so much explain unbounded evolution as assume it.

However, Richard Lenski has pointed out that these sort of models can be a good fit for moderate evolutionary time scales (which is what they aim to explain, with the LTEE in particular), where the number of new mutations is smaller the number of genes (or where the number of generation is on the order of the number of genes). In such a linear regime, I don’t think you’d see much difference between some hard finite landscape and these infinite tree landscapes.

But it is important that these infinite ‘landscapes’ lack the combinatorial structure that was central to fitness landscapes. So we probably shouldn’t call the two by the same name.

## Share this:

RelatedFiled under Commentary, Models Tagged with algorithmic philosophy, cstheory, fitness landscapes, metamodeling