The gene-interaction networks of easy fitness landscapes

Since evolutionary fitness landscapes have been a recurrent theme on TheEGG, I want to return, yet again, to the question of finding local peaks in fitness landscapes. In particular, to the distinction between easy and hard fitness landscapes.

Roughly, in easy landscapes, we can find local peaks quickly and in hard ones, we cannot. But this is very vague. To be a little more precise, I have to borrow the notion of orders of growth from the asymptotic analysis standard in computer science. A family of landscapes indexed by a size n (usually corresponding to the number of genes in the landscape) is easy if a local fitness optimum can be found in the landscapes in time polynomial in n and hard otherwise. In the case of hard landscapes, we can’t guarantee to find a local fitness peak and thus can sometimes reason from a state of perpetual maladaptive disequilibrium.

In Kaznatcheev (2019), I introduced this distinction to biology. Since hard landscapes have more interesting properties which are more challenging to theoretical biologist’s intuitions, I focused more on this. This was read — perhaps rightly — as me advocating for the existence or ubiquity of hard landscapes. And that if hard landscapes don’t occur in nature then my distinction is pointless. But I don’t think this is the most useful reading.

It certainly would be fun if hard landscapes were a feature of nature since they give us a new way to approach certain puzzles like the maintenance of cooperation, the evolution of costly learning, or open-ended evolution. But this is an empirical question. What isn’t a question is that hard landscape are a feature of our mental and mathematical models of evolution. As such, all — or most, whatever that means — fitness landscapes being easy is still exciting for me. It means that the easy vs hard distinction can push us to refine our mental models such that if only easy landscapes occur in nature then our models should only be able to express easy landscapes.

In other words, using computational complexity to build upper-bounds arguments (that on certain classes of landscapes, local optima can be found efficiently) can be just as fun as lower-bounds arguments (that on certain classes of landscapes, evolution requires at least a super-polynomial effort to find any local fitness peak). However, apart from a brief mention of smooth landscapes, I did not stress the upper-bounds in Kaznatcheev (2019).

Now, together with David Cohen and Peter Jeavons, I’ve taken this next step — at least in the cstheory context, we still need to write on the biology. So in this post, I want to talk briefly about a biological framing of Kaznatcheev, Cohen & Jeavons (2019) and the kind of fitness landscapes that are easy for evolution.

First, I need to address one more ambiguity in the definition of easy vs hard landscapes. This one is more of a feature than a bug. And it is the ambiguity over “can be found”. In particular, what class of algorithms are we considering? In Kaznatcheev (2019), I use this ambiguity to consider three progressively more restrictive views of hard landscape. When talking about semismooth fitness landscapes, I consider a family of landscapes hard if random fitten mutant or fittest mutant strong-selection weak-mutation (SSWM) dynamics cannot find the peak in polynomial time. This is a notion of hard for a particular — although popular — evolutionary model. For the more rugged notion of landscapes, I consider two broader classes. First is any SSWM dynamics — i.e. any dynamic that is localized to a single point and moves only uphill — and second is any polynomial time algorithm — so any evolutionary dynamic, even ones that are distributed over the genotype space, go through valleys, jump around, etc. Obviously, later versions of hardness imply earlier ones, but earlier one do not imply later ones.

A similar issue can arise with upper-bounds. In the cstheory literature, if we want to estalish a complexity upper bound, we just need to find some algorithm that solves the problem efficiently. This isn’t compelling to a biologist. More importantly, it isn’t compelling to me, since it throws away the abstraction over unknown evolutionary dynamics that I wanted to achieve with algorithmic biology. But being easy for any algorithm is impossible: just consider the algorithm that doesn’t do anything or goes strictly downhill instead of up.

As such, we need a reasonable class of evolutionary dynamics as the standard for what makes a landscape easy. In the case of Kaznatcheev, Cohen, and Jeavons (2019), we went for a combinatorial feature of landscapes: longest paths. We said a family of landscapes is easy if all adaptive paths are at most polynomial length. In terms of dynamics, this means that any local adaptive dynamic — i.e. a dynamic that goes strictly uphill in the fitness graph — no matter how silly, will find a local fitness peak in polynomial time.

This is the definition under which smooth landscapes are easy. In a smooth landscape, there is a single peak. Given any point in genotype space, if that point is Hamming-distance m from the peak — i.e. differs from the peak by m mutations — then any adaptive path from that point has length at most m. And since m is always less than or equal to the number of loci n, this means all smooth fitness landscapes are easy.

Our goal was to find a richer class of easy fitness landscapes.

For this, we needed a compact way of representing fitness landscapes. So we spent the first half of the paper defining constraint graphs and equivalence classes of constraint graphs. In biological terminology, these are gene-interaction networks.

By using the results of section 4 of Kaznatcheev, Cohen & Jeavons (2019), we can define a gene-interaction network as follows. Given two loci i and j with alleles in D, we will say there is an interaction between them if there is some assignment y to the other n – 2 loci such that the sublandscape yD^2 has at least sign epistasis. If no such y exists then we say that i and j don’t interact. We then have the gene-interaction network as a graph where the vertices are loci and edges are drawn between any two loci that interact. See section 3 and 4 for how this definition relates to the generalized NK-model.

In this representation, a smooth landscape — which is defined as having no sign-epistasis — would correspond to a gene-interaction network with no edges.

In section 5, we establish with Theorem 6 that in a biallelic system (i.e. D = {0,1}) if the gene-interaction network is a tree (or forest) then it corresponds to an easy fitness landscape. More strictly, we show that the longest adaptive path in a fitness landscapes correspond to any gene-interaction tree on m loci has length at most {n \choose 2} + n.

The proof is cute and I would encourage you to read it. Pun intended — although only clear after reading Definition 12.

We also establish that this result is tight in various ways. For example, there is indeed a biallelic gene-interaction tree (path, in fact) that produces a fitness landscape with an adaptive path of length {n \choose 2} + n. If we move from biallelic to triallelic landscapes then the result disappears: there is a triallelic gene-interaction tree (path, in fact) that produces a fitness landscape with an exponentially long adaptive path. Similarly, for biallelic systems, if we move from trees to gene-interaction networks with treewidth two then there is again a fitness landscape with exponentially long adaptive paths.

There are a number of important caveats here.

First, just because the particular triallelic or biallelic treewidth two landscapes aren’t easy doesn’t mean they are hard. In particular, they produce long walks for silly rules like reluctant adaptive dynamics (i.e. moving to the fitter mutant that has the smallest increase in fitness — least-fit SSWM).

Second, it doesn’t mean there aren’t other classes of easy landscapes that don’t have tree-like gene-interaction networks. An obvious such class is block-models with constant-sized blocks.

But this can give us at least a bit more intuition on what kind of fitness landscapes are easy and why.

References

Kaznatcheev, A. (2019). Computational complexity as an ultimate constraint on evolution.Genetics 212(1): 245.

Kaznatcheev, A., Cohen, D.A., & Jeavons, P.G. (2019). Representing fitness landscapes by valued constraints to understand the complexity of local search. 25th International Conference on Principles and Practice of Constraint Programming arXiv:1907.01218

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.