# Holey adaptive landscapes

September 8, 2011 11 Comments

Hello world :-)

My research interests has veered off pure EGT, but my questions still center around evolution — particularly the evolution of complex systems that are made up of many small components working in unison. In particular, I’ve been studying Gavrilets *et al.* ‘s model of holey fitness landscapes, I think it’s a model with great potential for studying macroevolution, or evolution on very long timescales. I’m not the first one to this idea, of course — Arnold and many others have seen the possible connection also, although I think of it in a rather different light.

In this first post, I will give a short summary of this model, cobbled together from several papers and Gavrilets’ book, the Origin of Species. The basic premise is that organisms can be characterized by a large number of traits. When we say large, we mean very large — thousands or so. Gavrilets envisions this as being the number of genes in an organism, so tens of thousands. The important thing is that each of these traits can change independently of other ones.

The idea that organisms are points in very high dimensional space is not new, Fisher had it in his 1930 classic Genetical Theory of Natural Selection, where he used this insight to argue for micromutationism — in such high dimensional space, most mutations of appreciable size are detrimental, so Fisher argued that most mutations must be small (this result was later corrected by Kimura, Orr and others, who argued that most mutations must be of intermediate size, since tiny mutations are unlikely to fix in large populations).

However, even Fisher didn’t see another consequence of high-dimensional space, which Gavrilets exploited mercilessly. The consequence is that in high-enough dimensional space, there is no need to cross fitness valleys to move between one high fitness phenotype to another; all high fitness genotypes are connected. This is because connectivity is exceedingly easy in high dimensional space. Consider two dimensions, to get from one point to another, there are only two directions to move in. Every extra dimension offers a new option for such movement, that’s why there’s a minimum dimensionality to chaotic behavior — we can’t embed a strange attractor in a two dimensional phase plane, since trajectories can’t help but cross each other. Three dimensions is better, but *n-*dimensional space, where *n* is in the tens of thousands — that’s really powerful stuff.

Basically, every phenotype — every point in *n*-D space, is connected to a huge number of other points in *n*-D space. That is, every phenotype has a huge number of neighbors. Even if the probability of being a highly fit organism is exceedingly small, chances are high that one would exist among this huge number of neighbors. We know that if each highly fit phenotype is, on average, connected to another highly fit phenotype (via mutation), then the percolation threshold is reached where almost all highly fit phenotypes are connected in one giant connected component. In this way, evolution does not have to traverse fitness minima.

If we consider mutations to be point mutations of genes, then mutations can be considered to be a Manhattan distance type walk in *n-*D space. That’s just a fancy way of saying that we have *n* genes, and only one can be changed at a time. In that case, the number of neighbors any phenotype has is *n*, and if the probability of being highly fit is better than 1/*n*, then highly fit organisms are connected. This is even easier if we consider mutations to be random movements in *n*-D space. That is, if we consider an organism to be characterized by , where is the trait, and a mutation from results in ), such that is a random small number that can be negative, and the Euclidean distance between and is less than , where is the maximum mutation size, then the neighbors of fill up the volume of a ball of radius around . The volume of this ball grows exponentially with *n, *so even a tiny probability of being highly fit will find some neighbor of that is highly fit, because of the extremely large volume even for reasonably sized *n*.

The fact that evolution may never have to cross fitness minima is extremely important, it means that most of evolution may take place on “neutral bands”. Hartl and Taube had foreseen this really interesting result. Gavrilets mainly used this result to argue for speciation, which he envisions as a process that takes place naturally with reproductive isolation and has no need for natural selection.

Several improvements over the basic result have been achieved, mostly in the realm of showing that even if highly fit phenotypes are highly correlated (forming “highly fit islands” in phenotype space), the basic result of connectivity nevertheless holds (i.e. there will be bridges between those islands). Gavrilets’ book summarizes some early results, but a more recent paper (Gravner *et al.*) is a real tour-de-force in this direction. Their last result shows that the existence of “incompatibility sets”, that is, sets of traits that destroy viability, nevertheless does not punch enough holes in *n-*D space to disconnect it. Overall, the paper shows that even with correlation, percolation (connectedness of almost all highly fit phenotypes) is still the norm.

Next Thursday, I will detail some of my own criticisms to this model and its interpretation. The week after next, I will hijack this model for my own purposes and I will attempt to show that such a model can display a great deal of historical contingency, leading to irreversible, Muller’s Ratchet type evolution that carries on in particular directions even against fitness considerations. This type of model, I believe, will provide an interesting bridge between micro and macroevolution.

Great first post Julian!

However, I have a qualm with the following statement:

>That’s just a fancy way of saying that we have n genes, and only one can be changed at a time. In that case, the number of neighbors any phenotype has is n, and if the probability of being highly fit is better than 1/n, then highly fit organisms are connected.

You say this as if it is a mathematical fact. Do you maybe mean “if the probability of a highly fit genotype having a highly fit neighbor is better than 1/n, then with high probability fit organisms for a huge connected component”? As stated, however, it is not hard to come up with relatively trivial counterexamples: let our genotype be n-bit strings and our fitness function “if parity of the string is 1 then the organism is highly fit, if it is zero then the organism is completely unfit”. Clearly, 1/2 > 1/n of the genotypes are highly fit, yet no two highly fit genotype are adjacent.

Alternatively, do you mean that the fitness function has to be random? In which case, how is that justifiable?

Ake, rigor isn’t my strong suite… hahahhaa… thanks for catching that! Your statement is exactly what I meant:

“If the probability of a highly fit genotype having a highly fit neighbor is better than 1/n, then with high probability fit organisms for a huge connected component”

And you infer exactly correctly, that there is a background assumption that the fitness function is random. It’s not justifiable at all, but it’s mathematically, er, easy. That’s why there was so much work studying non-random fitness functions, ones that are highly correlated (if my phenotype is higly fit, then a nearby-mutant of me has a much higher chance of being highly fit than some randomly chosen phenotype). Gravner et al is the most serious thrust in this direction — although for the sake of tractability, their correlation functions still aren’t very believable. They’re much better than random though.

Interesting. This leads me to a further question, in the discrete model, it seems like everyone wants to use hypercubes or something very similar. Is there a technical reason for why working with them is easier or more justifiable then say random n-regular graphs? or some other specific graphs? In that case the degree of your vertices are analogous to dimension (all the possible ways you could move) and if you want capture the notion of “huge” genotype space, then you require that the graph be exponential in the degree. However, by considering arbitrary graphs (or particular graphs where there is some variety in degree). You can start to explore the sort of ratchet ideas you’ve mentioned before.

In particular, you can place some constraints on the graph, for instance, if a vertex has a degree k, then any of its neighbors have degree k – 1, k, or k + 1. Then you can interpret moving from a k-degree to a (k+1)-degree vertex as either an increase in mutation rate or size, or simply as acquiring a new trait on which you can now mutate. You can then start building these sort of models, and by showing that any random-walk starting in a low-degree part of the graph will with high-probability end up trapped in the high-degree part (even if there is a fitness-reduction in the high-degree part) you will show a sort of drive towards complexity. Alternatively, if you show that the walker gets trapped in a high-degree component where most lower-degree neighbors are unfit, then you’ve show the ratchet-effect.

I don’t know… not sure if I am making any sense.

You make perfect sense to me — but I can’t vouch for anyone else! Hehehehe… In fact, this is precisely what I had been thinking of.

The reason people use hypercubes is that that’s a realistic model of genetic mutation. If each genome is a string (11010101…) that can only mutate to its neighbor by flipping one of its 1’s or 0’s, that sets up a natural hypercube. Putting it on another graph would require some justification — but indeed, I do intend to consider graphs withere there is variation in degree. Great variation, in fact.

Refresh me on what a n-regular graph is?

An n-regular graph would be an alternative (but as you mention, a hard to justify one) for the hypercube. n-regular means that each vertex has degree n. The reason I suggested them, is because random n-regular graphs are pretty well understood, and hence easier to work with.

Thinking a little further about the increase-in-complexity. I was going to write that there are two obvious extremes, and that they are both have flaws, but as I was typing up one of the extremes (the one I prefer) I realized that it can be modified into a pretty convincing model. Well, I guess you will be the judge of convincing-ness. I will now describe the vertex and edge set:

– each vertex is labeled by an integer k (call it a k-vertex) and a k-bit string

– each k-vertex is adjacent to every other k-vertex whose bitstring is hamming distance 1 from it: in other words the k-vertex is adjacent to the k-vertex if there is exactly one i such that .

– each k-vertex is adjacent to a (k+1)-vertex whose bitstring is the same as the k-vertex, except with one addition somewhere (either at the start, the end, or between two indexes). As an example, will be adjacent to and and and .

It is natural to then try to make this an infinite graph (i.e. every is allowed) but unless there is a fitness penalty for larger and larger k then analysis will be hard. However, I think it is relatively natural to penalize (at least on average) organisms for having very long genomes, just because those would take more energy to read and copy.

The idea is exactly that even there is a penalty for complexity (the penalty for long genomes must be tiny — lots of fit organisms with tons and tons of unfunctional DNA — the more interesting penalty is a penalty for complexity, period), complexity might nevertheless evolve, and evolve irreversibly :-))

Pingback: Criticisms of holey adaptive landscapes « Evolutionary Games Group

Pingback: Evolving viable systems « Evolutionary Games Group

Pingback: Micro-vs-macro evolution is a purely methodological distinction | Theory, Evolution, and Games Group

Pingback: Software through the lens of evolutionary biology | Theory, Evolution, and Games Group

Science Direct wants to charge me $40 to read those papers you reference…that makes me feel rather queasy!