Black swans and Orr-Gillespie theory of evolutionary adaptation

FatTailsCrisis
The internet loves fat tails, it is why awesome things like wikipedia, reddit, and countless kinds of StackExchanges exist. Finance — on the other hand — hates fat tails, it is why VaR and financial crises exist. A notable exception is Nassim Taleb who became financially independent by hedging against the 1987 financial crisis, and made a multi-million dollar fortune on the recent crisis; to most he is known for his 2007 best-selling book The Black Swan. Taleb’s success has stemmed from his focus on highly unlikely events, or samples drawn from far on the tail of a distribution. When such rare samples have a large effect then we have a Black Swan event. These are obviously important in finance, but Taleb also stresses its importance to the progress of science, and here I will sketch a connection to the progress of evolution.

A more controversial aspect of The Black Swan is Taleb’s dismissal of statistics, and labeling those that claim to make predictions from statistical models as charlatans (although he tones this down when talking to statisticians). Taleb presents a solid argument for financial distributions not being symmetric and violating assumptions of normality; unfortunately for his dismissal, statistics and quants have long been aware of this and developed tools to deal with it. Mathematical finance regularly uses fat-tailed distributions for asset returns, and statisticians developed extreme value theory (EVT) for understanding fat-tailed sampling; the Bank of Canada even has a guide for applying EVT to finance, maybe that is one of the reason why Canada didn’t feel the crisis as much? Developed by famed evolutionary biologist R.A. Price, statisticians L. H. C. Tippett and mathematician E.J. Gumbel, EVT deals with extreme deviations from the median of a probability distribution. One of the nice results of EVT is that when we are looking at the asymptotic behavior of distributions’ tails, much of the nitty-gritty details of the distribution become irrelevant and just the general qualitative features of the tail matter. This is extremely useful when we don’t have a good idea of what our distribution should look like, as is common in finance and — as we learned on Sunday — in the study of adaptive fitness landscapes.

Orr-Gillespie theory (Gillespie, 1991; Orr, 2002; 2006) is based on Gillespie’s insight that the wild-type represents a draw from the tail of a fitness distribution, and beneficial mutations are even more extreme draws from this tail. The typical model considers n loci, a parameter describing the tail of the distribution, and a wild-type with far in the tail. At each time step, we draw n samples from the distribution (for each potential mutation) and if one of them has higher fitness than our current value, we take an adaptive step, and repeat the process. Most early work (Orr 2002; 2003; 2006) use the Gumbel distribution for the maximum of samples drawn which is only a good model of exponential distributions, but newer work is starting to generalize to fat-tailed distributions for fitnesses (Jain, 2011). Unfortunately, this basic model has a fundamental limitation in that it completely ignores the structure of the landscape. Sampling from a fixed fitness distribution is like taking a mean-field (or zero-order) approximation of the underlying graph. In an analogy to evolutionary games on graphs, this is like using replicator dynamics to describe a structured population.

Kryazhimskiy et al. (2009) have moved beyond this basic set-up to fitness distributions that depend on the current fitness of the wild type. This allows a first-order approximation, because it can account for the fact that fitter genotypes are more likely to have fitter neighbours that sampling the whole population. The authors consider three possible ways that expected fitness of a mutant can change with current fitness of wild-type, and five ways that the expected fitness increment of a beneficial mutation can change with fitness of wild-type. This leads to 14 qualitatively different profiles for fitness versus time and number-of-adaptations versus time (two combinations have the same qualitative dynamics, hence 14 not 15 total). They use their qualification to provide an empirical classification of epistasis based on just the history of fitness change and number of adaptations, instead of direct examination of the fitness landscape. This is a tempting approach, but since it does not look directly at the fitness landscape it cannot distinguish sign epistasis from being very close to a smooth fitness optimum. It also has absolutely no means to see reciprocal sign epistasis because that would require a second order approximation that can detect inaccessible higher-fitness vertexes two steps away.

Overall, I have mixed feelings about Orr-Gillespie theory. On the one hand, it allows researchers to make fewer assumptions about the particulars of the fitness distribution and connects nicely to easier to gather experimental data. On the other hand, it completely throws away the defining feature of fitness landscapes: the underlying graph of adjacent genotypes. This insistent on random draws from probability distributions that we don’t understand is a paradigm set by Ohta (1977) and Kimura (1979) that has been reinforced by physicists trained in statistical mechanics entering the field. I think it is important to embrace the alternative approach where we focus on the structure and not the statistics of fitness landscapes. We should take Taleb’s advice an admit an ignore in describing these unknown distributions, even if we just care about the tail. Theoretical computer science offers us tools in this regard, it provides alternatives to probability like worst-case or adversarial analysis, and nondeterminism. Maybe viewing evolutionary adaptation through the algorithmic lens is the sort of intellectual Black Swan event that Taleb thinks are essential for scientific progress.

The header picture combines Fat Tails by Gnight based on the Sonic the Hedgehog character with a graph of financial data from the 2007-2008 financial crisis; in green and blue are LIBOR and the USGG3M, and in red is the TED spread, it shows the loss of confidence in credit during the crisis. This is the third of a series of posts on computational considerations of empirical and theoretical fitness landscapes. In the previous post I gave a historic overview of fitness landscapes as mental and mathematical metaphors, and highlighted our empirical knowledge of them.

References

Gillespie, J.H. (1991). The causes of molecular evolution. Oxford University Press.

Jain, K. (2011). Number of adaptive steps to a local fitness peak. EPL 96(2011): 58006.

Kimura, M. (1979). Models of effectively neutral mutations in which selective constraint is incorporated. Proc. Natl. Acad. Sci. USA 76: 3440-3444.

Kryazhimskiy, S., Tkacik, G., & Plotkin, J.B. (2009). The dynamics of adaptation on correlated fitness landscapes. Proc. Natl. Acad. Sci. USA, 106 (44), 18638-18643 DOI: 10.1073/pnas.0905497106

Ohta, T. (1977). Extension to the neutral mutation random drift hypothesis. In Kimura, M (Eds) Molecular evolution and polymorphism. (p. 148-167) National Institute of Genetics, Mishima, Japan.

Orr, H.A. (2002). The population genetics of adaptation: the adaptation of DNA sequences. Evolution 56: 1317-1330.

Orr, H.A. (2003). The distribution of fitness effects among beneficial mutations. Genetics 163: 1519-1526.

Orr, H.A. (2006). The population genetics of adaptation on correlated fitness landscapes: the block model. Evolution 60(6): 1113-1124.

Advertisements

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

6 Responses to Black swans and Orr-Gillespie theory of evolutionary adaptation

  1. Pingback: NK and block models of fitness landscapes | Theory, Evolution, and Games Group

  2. Pingback: Computational complexity of evolutionary equilibria | Theory, Evolution, and Games Group

  3. Pingback: Semi-smooth fitness landscapes and the simplex algorithm | Theory, Evolution, and Games Group

  4. Pingback: Baldwin effect and overcoming the rationality fetish | Theory, Evolution, and Games Group

  5. Pingback: Software through the lens of evolutionary biology | Theory, Evolution, and Games Group

  6. Pingback: Cataloging a year of blogging: the algorithmic world | Theory, Evolution, and Games Group

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s