The gene-interaction networks of easy fitness landscapes

Since evolutionary fitness landscapes have been a recurrent theme on TheEGG, I want to return, yet again, to the question of finding local peaks in fitness landscapes. In particular, to the distinction between easy and hard fitness landscapes.

Roughly, in easy landscapes, we can find local peaks quickly and in hard ones, we cannot. But this is very vague. To be a little more precise, I have to borrow the notion of orders of growth from the asymptotic analysis standard in computer science. A family of landscapes indexed by a size n (usually corresponding to the number of genes in the landscape) is easy if a local fitness optimum can be found in the landscapes in time polynomial in n and hard otherwise. In the case of hard landscapes, we can’t guarantee to find a local fitness peak and thus can sometimes reason from a state of perpetual maladaptive disequilibrium.

In Kaznatcheev (2019), I introduced this distinction to biology. Since hard landscapes have more interesting properties which are more challenging to theoretical biologist’s intuitions, I focused more on this. This was read — perhaps rightly — as me advocating for the existence or ubiquity of hard landscapes. And that if hard landscapes don’t occur in nature then my distinction is pointless. But I don’t think this is the most useful reading.

It certainly would be fun if hard landscapes were a feature of nature since they give us a new way to approach certain puzzles like the maintenance of cooperation, the evolution of costly learning, or open-ended evolution. But this is an empirical question. What isn’t a question is that hard landscape are a feature of our mental and mathematical models of evolution. As such, all — or most, whatever that means — fitness landscapes being easy is still exciting for me. It means that the easy vs hard distinction can push us to refine our mental models such that if only easy landscapes occur in nature then our models should only be able to express easy landscapes.

In other words, using computational complexity to build upper-bounds arguments (that on certain classes of landscapes, local optima can be found efficiently) can be just as fun as lower-bounds arguments (that on certain classes of landscapes, evolution requires at least a super-polynomial effort to find any local fitness peak). However, apart from a brief mention of smooth landscapes, I did not stress the upper-bounds in Kaznatcheev (2019).

Now, together with David Cohen and Peter Jeavons, I’ve taken this next step — at least in the cstheory context, we still need to write on the biology. So in this post, I want to talk briefly about a biological framing of Kaznatcheev, Cohen & Jeavons (2019) and the kind of fitness landscapes that are easy for evolution.

Read more of this post

Fighting about frequency and randomly generating fitness landscapes

A couple of months ago, I was in Cambridge for the Evolution Evolving conference. It was a lot of fun, and it was nice to catch up with some familiar faces and meet some new ones. My favourite talk was Karen Kovaka‘s “Fighting about frequency”. It was an extremely well-delivered talk on the philosophy of science. And it engaged with a topic that has been very important to discussions of my own recent work. Although in my case it is on a much smaller scale than the general phenomenon that Kovaka was concerned with,

Let me first set up my own teacup, before discussing the more general storm.

Recently, I’ve had a number of chances to present my work on computational complexity as an ultimate constraint on evolution. And some questions have repeated again and again after several of the presentations. I want to address one of these persistent questions in this post.

How common are hard fitness landscapes?

This question has come up during review, presentations, and emails (most recently from Jianzhi Zhang’s reading group). I’ve spent some time addressing it in the paper. But it is not a question with a clear answer. So unsurprisingly, my comments have not been clear. Hence, I want to use this post to add some clarity.

Read more of this post

Introduction to Algorithmic Biology: Evolution as Algorithm

As Aaron Roth wrote on Twitter — and as I bet with my career: “Rigorously understanding evolution as a computational process will be one of the most important problems in theoretical biology in the next century. The basics of evolution are many students’ first exposure to “computational thinking” — but we need to finish the thought!”

Last week, I tried to continue this thought for Oxford students at a joint meeting of the Computational Society and Biological Society. On May 22, I gave a talk on algorithmic biology. I want to use this post to share my (shortened) slides as a pdf file and give a brief overview of the talk.

Winding path in a hard semi-smooth landscape

If you didn’t get a chance to attend, maybe the title and abstract will get you reading further:

Algorithmic Biology: Evolution is an algorithm; let us analyze it like one.

Evolutionary biology and theoretical computer science are fundamentally interconnected. In the work of Charles Darwin and Alfred Russel Wallace, we can see the emergence of concepts that theoretical computer scientists would later hold as central to their discipline. Ideas like asymptotic analysis, the role of algorithms in nature, distributed computation, and analogy from man-made to natural control processes. By recognizing evolution as an algorithm, we can continue to apply the mathematical tools of computer science to solve biological puzzles – to build an algorithmic biology.

One of these puzzles is open-ended evolution: why do populations continue to adapt instead of getting stuck at local fitness optima? Or alternatively: what constraint prevents evolution from finding a local fitness peak? Many solutions have been proposed to this puzzle, with most being proximal – i.e. depending on the details of the particular population structure. But computational complexity provides an ultimate constraint on evolution. I will discuss this constraint, and the positive aspects of the resultant perpetual maladaptive disequilibrium. In particular, I will explain how we can use this to understand both on-going long-term evolution experiments in bacteria; and the evolution of costly learning and cooperation in populations of complex organisms like humans.

Unsurprisingly, I’ve writen about all these topics already on TheEGG, and so my overview of the talk will involve a lot of links back to previous posts. In this way. this can serve as an analytic linkdex on algorithmic biology.
Read more of this post

British agricultural revolution gave us evolution by natural selection

This Wednesday, I gave a talk on algorithmic biology to the Oxford Computing Society. One of my goals was to show how seemingly technology oriented disciplines (such as computer science) can produce foundational theoretical, philosophical and scientific insights. So I started the talk with the relationship between domestication and natural selection. Something that I’ve briefly discussed on TheEGG in the past.

Today we might discuss artificial selection or domestication (or even evolutionary oncology) as applying the principles of natural selection to achieve human goals. This is only because we now take Darwin’s work as given. At the time that he was writing, however, Darwin actually had to make his argument in the other direction. Darwin’s argument proceeds from looking at the selection algorithms used by humans and then abstracting it to focus only on the algorithm and not the agent carrying out the algorithm. Having made this abstraction, he can implement the breeder by the distributed struggle for existence and thus get natural selection.

The inspiration is clearly from the technological to the theoretical. But there is a problem with my story.

Domestication of plants and animals in ancient. Old enough that we have cancers that arose in our domesticated helpers 11,000 years ago and persist to this day. Domestication in general — the fruit of the first agricultural revolution — can hardly qualify as a new technology in Darwin’s day. It would have been just as known to Aristotle, and yet he thought species were eternal.

Why wasn’t Aristotle or any other ancient philosopher inspired by the agriculture and animal husbandry of their day to arrive at the same theory as Darwin?

The ancients didn’t arrive at the same view because it wasn’t the domestication of the first agricultural revolution that inspired Darwin. It was something much more contemporary to him. Darwin was inspired by the British agricultural revolution of the 18th and early 19th century.

In this post, I want to sketch this connection between the technological development of the Georgian era and the theoretical breakthroughs in natural science in the subsequent Victorian era. As before, I’ll focus on evolution and algorithm.

Read more of this post

Local maxima and the fallacy of jumping to fixed-points

An economist and a computer scientist are walking through the University of Chicago campus discussing the efficient markets hypothesis. The computer scientist spots something on the pavement and exclaims: “look at that $20 on the ground — seems we’ll be getting a free lunch today!”

The economist turns to her without looking down and replies: “Don’t be silly, that’s impossible. If there was a $20 bill there then it would have been picked up already.”

This is the fallacy of jumping to fixed-points.

In this post I want to discuss both the importance and power of local maxima, and the dangers of simply assuming that our system is at a local maximum.

So before we dismiss the economist’s remark with laughter, let’s look at a more convincing discussion of local maxima that falls prey to the same fallacy. I’ll pick on one of my favourite YouTubers, THUNK:

In his video, THUNK discusses a wide range of local maxima and contrasts them with the intended global maximum (or more desired local maxima). He first considers a Roomba vacuum cleaner that is trying to maximize the area that it cleans but gets stuck in the local maximum of his chair’s legs. And then he goes on to discuss similar cases in physics, chemisty, evolution, psychology, and culture.

It is a wonderful set of examples and a nice illustration of the power of fixed-points.

But given that I write so much about algorithmic biology, let’s focus on his discussion of evolution. THUNK describes evolution as follows:

Evolution is a sort of hill-climbing algorithm. One that has identified local maxima of survival and replication.

This is a common characterization of evolution. And it seems much less silly than the economist passing up $20. But it is still an example of the fallacy of jumping to fixed-points.

My goal in this post is to convince you that THUNK describing evolution and the economist passing up $20 are actually using the same kind of argument. Sometimes this is a very useful argument, but sometimes it is just a starting point that without further elaboration becomes a fallacy.

Read more of this post

Fitness distributions versus fitness as a summary statistic: algorithmic Darwinism and supply-driven evolution

For simplicity, especially in the fitness landscape literature, fitness is often treated as a scalar — usually a real number. If our fitness landscape is on genotypes then each genotype has an associated scalar value of fitness. If our fitness landscape is on phenotypes then each phenotype has an associated scalar value of fitness.

But this is a little strange. After all, two organisms with the same genotype or phenotype don’t necessarily have the same number of offspring or other life outcomes. As such, we’re usually meant to interpret the value of fitness as the mean of some random variable like number of children. But is the mean the right summary statistic to use? And if it is then which mean: arithmetic or geometric or some other?

One way around this is to simply not use a summary statistic, and instead treat fitness as a random variable with a corresponding distribution. For many developmental biologists, this would still be a simplification since it ignores many other aspects of life-histories — especially related to reproductive timing. But it is certainly an interesting starting point. And one that I don’t see pursued enough in the fitness landscape literature.

The downside is that it makes an already pretty vague and unwieldy model — i.e. the fitness landscape — even less precise and even more unwieldy. As such, we should pursue this generalization only if it brings us something concrete and useful. In this post I want to discuss two aspects of this: better integration of evolution with computational learning theory and thinking about supply driven evolution (i.e. arrival of the fittest). In the process, I’ll be drawing heavily on the thoughts of Leslie Valiant and Julian Z. Xue.

Read more of this post

Quick introduction: Generalizing the NK-model of fitness landscapes

As regular readers of TheEGG know, I’ve been interested in fitness landscapes for many years. At their most basic, a fitness landscape is an almost unworkably vague idea: it is just a mapping from some description of organisms (usually a string corresponding to a genotype or phenotype) to fitness, alongside some notion of locality — i.e. some descriptions being closer to each other than to some other descriptions. Usually, fitness landscapes are studied over combinatorially large genotypic spaces on many loci, with locality coming form something like point mutations at each locus. These spaces are exponentially large in the number of loci. As such, no matter how rapidly next-generation sequencing and fitness assays expand, we will not be able to treat a fitness landscape as simply an array of numbers and measure each fitness. At least for any moderate or larger number of genes.

The space is just too big.

As such, we can’t consider an arbitrary mapping from genotypes to fitness. Instead, we need to consider compact representations.

Ever since Julian Z. Xue first introduced me to it, my favorite compact representation has probably been the NK-model of fitness landscapes. In this post, I will rehearse the definition of what I’d call the classic NK-model. But I’ll then consider how the model would have been defined if it was originally proposed by a mathematician or computer scientists. I’ll call this the generalized NK-model and argue that it isn’t only mathematically more natural but also biologically more sensible.
Read more of this post

Supply and demand as driving forces behind biological evolution

Recently I was revisiting Xue et al. (2016) and Julian Xue’s thought on supply-driven evolution more generally. I’ve been fascinated by this work since Julian first told me about it. But only now did I realize the economic analogy that Julian is making. So I want to go through this Mutants as Economic Goods metaphor in a bit of detail. A sort of long-delayed follow up to my post on evolution as a risk-averse investor (and another among many links between evolution and economics).

Let us start by viewing the evolving population as a market — focusing on the genetic variation in the population, in particular. From this view, each variant or mutant trait is a good. Natural selection is the demand. It prefers certain goods over others and ‘pays more’ for them in the currency of fitness. Mutation and the genotype-phenotype map that translates individual genetic changes into selected traits is the supply. Both demand and supply matter to the evolutionary economy. But as a field, we’ve put too much emphasis on the demand — survival of the fittest — and not enough emphasis on the supply — arrival of the fittest. This accusation of too much emphasis on demand has usually been raised against the adaptationist program.

The easiest justification for the demand focus of the adapatationist program has been one of model simplicity — similar to the complete market models in economics. If we assume isotropic mutations — i.e. there is the same unbiased chance of a trait to mutate in any direction on the fitness landscape — then surely mutation isn’t an important force in evolution. As long as the right genetic variance is available then nature will be able to select it and we can ignore further properties of the mutation operator. We can make a demand based theory of evolution.

But if only life was so simple.
Read more of this post

Open-ended evolution on hard fitness landscapes from VCSPs

There is often interest among the public and in the media about evolution and its effects for contemporary humans. In this context, some argue that humans have stopped evolving, including persons who have a good degree of influence over the public opinion. Famous BBC Natural History Unit broadcaster David Attenborough, for example, argued a few years ago in an interview that humans are the only species who “put halt to natural selection of its own free will”. The first time I read this, I thought that it seemed plausible. The advances in medicine that we made in the last two centuries mean that almost all babies can reach adulthood and have children of their own, which appears to cancel natural selection. However, after more careful thought, I realized that these sort of arguments for the ‘end of evolution’ could not be true.

Upon more reflection, there just seem to be better arguments for open-ended evolution.

One way of seeing that we’re still evolving is by observing that we actually created a new environment, with very different struggles than the ones that we encountered in the past. This is what Adam Benton (2013) suggests in his discussion of Attenborough. Living in cities with millions of people is very different from having to survive in a prehistoric jungle, so evolutionary pressures have shifted in this new environment. Success and fitness are measured differently. The continuing pace of changes and evolution in various fields such as technology, medicine, sciences is a clear example that humans continue to evolve. Even from a physical point of view, research shows that we are now becoming taller, after the effects of the last ice age faded out (Yang et al., 2010), while our brain seems to get smaller, for various reasons with the most amusing being that we don’t need that much “central heating”. Take that Aristotle! Furthermore, the shape of our teeth and jaws changed as we changed our diet, with different populations having a different structure based on the local diet (von Cramon-Taubadel, 2011).

But we don’t even need to resort to dynamically changing selection pressures. We can argue that evolution is ongoing even in a static environment. More importantly, we can make this argument in the laboratory. Although we do have to switch from humans to a more prolific species. A good example of this would be Richard Lenski’s long-term E-coli evolution experiment (Lenski et al., 1991) which shows that evolution is still ongoing after 50000 generations in the E-coli bacteria (Wiser et al., 2013). The fitness of the E. coli keeps increasing! This certainly seems like open-ended evolution.

But how do we make theoretical sense of these experimental observations? Artem Kaznatcheev (2018) has one suggestion: ‘hard’ landscapes due to the constraints of computational complexity. He suggests that evolution can be seen as a computational problem, in which the organisms try to maximize their fitness over successive generations. This problem would still be constrained by the theory of computational complexity, which tells us that some problems are too hard to be solved in a reasonable amount of time. Unfortunately, Artem’s work is far too theoretical. This is where my third-year project at the University of Oxford comes in. I will be working together with Artem on actually simulating open-ended evolution on specific examples of hard fitness landscapes that arise from valued constraint satisfaction problems (VCSPs).

Why VCSPs? They are an elegant generalization of the weighted 2SAT problem that Artem used in his work on hard landscapes. I’ll use this blog post to introduce CSPs, VCSPs, explain how they generalize weighted 2 SAT (and thus the NK fitness landscape model), and provide a way to translate between the language of computer science and that of biology.

Read more of this post

Hobbes on knowledge & computer simulations of evolution

Earlier this week, I was at the Second Joint Congress on Evolutionary Biology (Evol2018). It was overwhelming, but very educational.

Many of the talks were about very specific evolutionary mechanisms in very specific model organisms. This diversity of questions and approaches to answers reminded me of the importance of bouquets of heuristic models in biology. But what made this particularly overwhelming for me as a non-biologist was the lack of unifying formal framework to make sense of what was happening. Without the encyclopedic knowledge of a good naturalist, I had a very difficult time linking topics to each other. I was experiencing the pluralistic nature of biology. This was stressed by Laura Nuño De La Rosa‘s slide that contrasts the pluralism of biology with the theory reduction of physics:

That’s right, to highlight the pluralism, there were great talks from philosophers of biology along side all the experimental and theoretical biology at Evol2018.

As I’ve discussed before, I think that theoretical computer science can provide the unifying formal framework that biology needs. In particular, the cstheory approach to reductions is the more robust (compared to physics) notion of ‘theory reduction’ that a pluralistic discipline like evolutionary biology could benefit from. However, I still don’t have any idea of how such a formal framework would look in practice. Hence, throughout Evol2018 I needed refuge from the overwhelming overstimulation of organisms and mechanisms that were foreign to me.

One of the places I sought refuge was in talks on computational studies. There, I heard speakers emphasize several times that they weren’t “just simulating evolution” but that their programs were evolution (or evolving) in a computer. Not only were they looking at evolution in a computer, but this model organism gave them an advantage over other systems because of its transparency: they could track every lineage, every offspring, every mutation, and every random event. Plus, computation is cheaper and easier than culturing E.coli, brewing yeast, or raising fruit flies. And just like those model organisms, computational models could test evolutionary hypotheses and generate new ones.

This defensive emphasis surprised me. It suggested that these researchers have often been questioned on the usefulness of their simulations for the study of evolution.

In this post, I want to reflect on some reasons for such questioning.

Read more of this post