Fitness distributions versus fitness as a summary statistic: algorithmic Darwinism and supply-driven evolution

For simplicity, especially in the fitness landscape literature, fitness is often treated as a scalar — usually a real number. If our fitness landscape is on genotypes then each genotype has an associated scalar value of fitness. If our fitness landscape is on phenotypes then each phenotype has an associated scalar value of fitness.

But this is a little strange. After all, two organisms with the same genotype or phenotype don’t necessarily have the same number of offspring or other life outcomes. As such, we’re usually meant to interpret the value of fitness as the mean of some random variable like number of children. But is the mean the right summary statistic to use? And if it is then which mean: arithmetic or geometric or some other?

One way around this is to simply not use a summary statistic, and instead treat fitness as a random variable with a corresponding distribution. For many developmental biologists, this would still be a simplification since it ignores many other aspects of life-histories — especially related to reproductive timing. But it is certainly an interesting starting point. And one that I don’t see pursued enough in the fitness landscape literature.

The downside is that it makes an already pretty vague and unwieldy model — i.e. the fitness landscape — even less precise and even more unwieldy. As such, we should pursue this generalization only if it brings us something concrete and useful. In this post I want to discuss two aspects of this: better integration of evolution with computational learning theory and thinking about supply driven evolution (i.e. arrival of the fittest). In the process, I’ll be drawing heavily on the thoughts of Leslie Valiant and Julian Z. Xue.


Both Valiant and Xue consider evolution operation on samples from a fitness distribution. But the interpretation of those samples is very different. The former sees samples as environmental challenges, and the latter sees samples as mutations.

Environmental challenges as samples from a fitness distribution

Today, machine learning has ballooned from a relatively precise discipline to a buzzword. Sometimes it feels like everyone who has ever run a linear regression at some point in their life is not reclassifying themselves as a machine learning expert. There is a lot of hype and a lot of vague claims. Mostly about deep learning.

But it is important to remember that among this vague excitement, there is also a deep history of rigorous theoretical grounding for machine learning. A branch of theoretical computer science that focuses on machine learning algorithms. This algorithmic view of machine learning is known as computational learning theory (CoLT). And one of its biggest early developments was Leslie Valiant’s 1984 introduction of probably approximately correct (PAC) learning.

In the PAC model, we want to learn how to generalize from samples. The learner receives samples from some environment and must select a hypothesis from a certain class of possible functions. The goal is that, with high probability, the hypothesis will have low generalization error — i.e. be approximately correct on new samples drawn from the environment. We saw that a given class of possible functions is PAC-learnable if we have an algorithm that can learn the concept given any arbitrary approximation ratio, probability of success, or distribution of the samples.

There is a rich literature on PAC-learnability. Of biggest interest to me is results that show that certain classes are not learn-able. Results that put limits on the magic of machine learning.

My hope is to use similar approaches to understand limits and ultimate constraints on evolution.

By 2009 — well before I was thinking about any of this — Leslie Valiant had built a bridge between CoLT and evolution.

For him, the samples became environmental challenges faced by biological organisms (the hypotheses), and evolution acted as the learning algorithm to better adapt the population to these challenges. Although Valiant did not frame his theory — what I now call algorithmic Darwinism — in term of fitness landscapes, it can be interesting to think about it in these terms.

In a standard fitness landscape, we think about each vertex as a genotype — a string in \{0,1\}^n. This changes for algorithmic Darwinism. Here, each vertex in the fitness landscape is a function response of the organism to potential environmental challenges. In the simplest case of a binary response, this means each vertex in the fitness landscape has the form \{0,1\}^n \rightarrow \{1,-1\}. Given an ideal function h and distribution of environmental challenges D, we have for each vertex point a distribution of results corresponding to a random variable with mean \langle f(x)h(x) \rangle_{x \sim D} — i.e. the correlation between the organisms’ response and the ideal response to the environment.

The standard approach of fitness landscapes would be to take this correlation as a single scalar and call it fitness. But Valiant’s experience with machine learning steered him away from this simple approach. Instead — in algorithm Darwinism — evolution only has access to empirical estimate of this correlation from s many samples of the distribution. Biologically, we can think of s as some combination of the population size and organisms’ length of life. For Valiant, it is important that s is finite, but variation in it is not important. Differences in s will matter more for Xue.

This sampling effectively builds in forces similar to those of nearly-neutral selection. If two types f and f’ have true correlations with h that differ by an amount that is exponentially small in s then there is no way for evolution to select one type over the other. Thus taking into account the effect of sampling in our definition prevents evolution from climbing up extreme shallow inclines. This can naturally produce interesting global consequences for evolution.

The perspective of algorithmic Darwinism is that the fitness of any given organism as stochastic.

Mutants as samples from a fitness distribution

Julian Z. Xue provides an alternative perspective. He views our label of a given genotype or phenotype as encompassing many different possible organisms with different deterministic finesses.

This is easier to understand with fitness landscapes defined over trait (or phenotype) space. Consider some trait — say arm length. There are many ways that an organism could have long arms. Some of these ways might be beneficial to reproduction and some detrimental. The result is that we associate with each point in trait space a distribution of different finesses corresponding to different ways of implementing that trait.

Whereas a sample for Valiant was the number of environmental challenges experienced by a population, a sample for Julian becomes the number of mutants with a given trait. Each mutant has its distinct fitness and can give rise to a distinct lineage. What starts to matter is not the average fitness of a phenotype but the highest fitness of a lineage realizing that phenotype. What matters it the tail of the distribution.

From this perspective it matters that the number of samples s is both finite and that it can be different for different phenotypes. This takes us back to supply-driven evolution.

Let’s return to Julian’s favourite trait: biological complexity.

It is certainly the case that different organisms of the same complexity can have different fitnesses. And — for the sake of argument — it might be that more complex organisms are on average less fit.

But given a mutation, if there is a much larger number of mutants that result in higher complexity than in lower complexity then we end up getting more samples from the distribution of fitness corresponding to high complexity than the one for low complexity. If this discrepancy is too large then even if the average finesses for lower complexity organisms is higher, the higher complexity trait will get to sample further in the tail of its distribution. This can result in it having a higher chance of finding a higher fitness lineage. This lineage will then take over, resulting in a higher complexity population, even though the lower complexity type might have had higher average fitness.

This can allow the stochastic demands of natural selection and the stochastic supply of mutation to work together in ways that might seem paradoxical from considerations of mean fitness.

Both Leslie Valiant and Julian Xue give us some interesting considerations for how extending the fitness landscape from scalars to distributions can produce new results. But there is much to explore in this direction.

And I am sure there are other approaches to moving from scalars to distributions that I am unaware of. So if you know particularly great examples, dear reader, then please let me know in the comments.

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

4 Responses to Fitness distributions versus fitness as a summary statistic: algorithmic Darwinism and supply-driven evolution

  1. Alistair says:

    Richard Feynman said that to make progress with a theory you have to guess, everybody guesses”.
    Isn’t it the case that human brains use theories – models – that are best guesses (what is most probable) to help us survive.
    And natural selection has supervised the development of the guessing process.
    If I guess that someone is my friend and am I am wrong then they could steal some food from me and my tribe. But if I am better than average at guessing where to find food my tribe may still think I am useful and women will mate with me to perpetuate my food finding abilities.
    Over many generations the quality of the guessing capabilities of my ancestors will be selected for or against depending on which capabilities are most important at any one time.
    If we lead largely solitary lives evolution will not enhance any single one of our guessing capabilities too much or else our brain will become too specialised and this will leave us vulnerable in a complex environment. But in a large tribe we can develop one guessing capability over others provided we cooperate and help each other. Einstein could not exist among hunter gatherers – at least not over many generations in which the environment changes its level of complexity.
    But no matter how good our guesses, as individuals or hunter gatherers, if we cannot physically move around in our environment to put put those guesses to practical use then we wont survive (obviously tribal leaders in cooperative societies can give orders that help everybody survive).
    So for individuals genotypes for increased physical prowess should have a higher fitness scalar value than genotypes for the quality of guessing.But in a tribe this isn’t necessarily the case.

  2. Julian Xue says:

    Thanks a lot buddy. In my own work I really think more about phenotypes, less about genotypes. If two identical genotypes end up having different fitness, that doesn’t matter in my theory, because that different in fitness isn’t transmitted to future generations. I’m thinking much more about how two creatures with the same phenotypic measure — the same trait value — have different fitnesses that transmit to the future.

    Sean Rice is the one who thinks about genotypes a lot. He’s the one who really figured that one out, given any genotype, because of environmental stochasticity, that genotype will have a certain distribution of fitness, call it D. Now, how does natural selection work on D? It turns out that it doesn’t just try to maximize the mean of D — it tries to maximize all the odd moments (mean, skew, etc) while minimizing even moments (variance, kurtosis). I have to say, it was a really fascinating work, and it showed how some genotypes, by increasing skew or reducing variance / kurtosis, can be selected for even though they had low mean fitnesses.

    • I’ve written before about how natural selection works on D, based in part on utility theory and in part on Orr & Gillespie. But I am not familiar with Rice’s work on this. Can you point me to a specific paper of Rice’s to read? The ‘maximize odd moments, minimize even moments’ seems like a dangerously simple conclusion and I’d be interested in the detailed argument behind it.

      I know that Dan Nichol has done a lot of good thinking about bet hedging, and maybe knows about some of the worries on when the bet is independently made by each generation or (partially) transmitted down the lineage. This all seems relevant. Maybe I can rope him into this conversation.

  3. Pingback: Game landscapes: from fitness scalars to fitness functions | Theory, Evolution, and Games Group

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.