# Fusion and sex in protocells & the start of evolution

In 1864, five years after reading Darwin’s On the Origin of Species, Pyotr Kropotkin — the anarchist prince of mutual aid — was leading a geographic survey expedition aboard a dog-sleigh — a distinctly Siberian variant of the HMS Beagle. In the harsh Manchurian climate, Kropotkin did not see competition ‘red in tooth and claw’, but a flourishing of cooperation as animals banded together to survive their environment. From this, he built a theory of mutual aid as a driving factor of evolution. Among his countless observations, he noted that no matter how selfish an animal was, it still had to come together with others of its species, at least to reproduce. In this, he saw both sex and cooperation as primary evolutionary forces.

Now, Martin A. Nowak has taken up the challenge of putting cooperation as a central driver of evolution. With his colleagues, he has tracked the problem from myriad angles, and it is not surprising that recently he has turned to sex. In a paper released at the start of this month, Sam Sinai, Jason Olejarz, Iulia A. Neagu, & Nowak (2016) argue that sex is primary. We need sex just to kick start the evolution of a primordial cell.

In this post, I want to sketch Sinai et al.’s (2016) main argument, discuss prior work on the primacy of sex, a similar model by Wilf & Ewens, the puzzle over emergence of higher levels of organization, and the difference between the protocell fusion studied by Sinai et al. (2016) and sex as it is normally understood. My goal is to introduce this fascinating new field that Sinai et al. (2016) are opening to you, dear reader; to provide them with some feedback on their preprint; and, to sketch some preliminary ideas for future extensions of their work.

### Prior work on the primacy of sex to evolution

Most biologists view, at least implicitly, asexual reproduction (asex) as simpler and more foundational than sexual reproduction (sex). They take asex as the obvious default state and thus leave sex as the thing to be explained. That is how the problem of sex arises (Barton & Charlesworth, 1998).[1] On the one hand, it is clear that sex is helpful because it can combine and thus share information between already adapted lineages, thus speeding up evolution. On the other hand, this combination is reshuffled after each reproduction by the random mixing of chromosomes from the parents. The question is then framed as: how do we balance these (and other similar) forces? There are countless explanations for this with varying sets of preconditions and assumptions, and the leading theory is one of plurality: sex arose in different lineages for different reasons (West et al., 1999).

Adi Livnat is not satisfied with this explanation. Instead, Livnat (2013; and others, see: Williams, 1966) argues that sex is the default state and without it evolutionary innovation is impossible. Empirically, he supports this by the rarity of obligate asexuals (Vrijenhoek, 1989) and that the ones we know of are all evolutionary dead-ends (Stebbins, 1957) that are confined to recent twigs on the tree of life (Williams, 1966; Van Valen, 1975), plus the rampant gene exchange in early life (Woese, 2002; Brosius, 2003; 2005). Theoretically, he proposes concrete mechanisms like writing phenotypes (Livnat, 2013; generalizing Wagner’s work, see Heard et al., 2010) and mutations from unsupervised Hebbian learning (Livnat & Papadimitriou, 2016). Although Sinai et al. (2016) do not reference this literature, they can be seen as working within it. Their aim is to show that the first protocells wouldn’t be able to get sufficiently complex to get metabolism and reproduction — and, with them, evolution — going without an analog of sex (protocell fusion) to assemble the pieces.[2]

### Sketch of the mathematical model and results

Sinai et al. (2016) suppose that a protocell[3] needs to contain N different parts to have sufficient complexity to start metabolism and reproduction. Their model tracks a single protocell — an accumulator — that is initialized with the possibility to get each of the N components with probability p. Thus, by random chance, we can only expect a viable cell containing all N components to arise with an exponentially low probability $p^N$. Converting to the time domain, this means that we can expect to wait exponentially long ($O((\frac{1}{p})^N)$) for the first viable cell to arise. To overcome this, they consider that a protocell has a probability $\delta$ of losing its membrane integrity before incountering a fusion partner. For simplicity, call this duration a time-step. In the case of disintegration, a new accumulator has to be sampled from scratch to replace it (so the $\delta = 1$ case is pure random sampling that takes exponentially long to converge). Otherwise, with probability $1 - \delta$ the cell fuses with another randomly created protocell and absorbs its components. This means that if an accumulator has K out of N components already then we can expect it to gain an average of p(N – K) for each time-step it maintains membrane integrity.

Sinai et al. (2016) go on to carefully approximate the expected time until a protocell accumulates all N components for all $0 < \delta < 1$ and also calculate it exactly for $\delta = 0$. For $0 < \delta < 1$ they show that the convergence time growns polynomially as $O(N^k)$ where $k = \frac{\log (1 - \delta)}{\log (1 - p)}$. In the limit of $(\delta,p) < < 1$ this is approximately $\frac{1}{\delta}N^\frac{\delta}{p}$. Even more exciting is that for the important case of $\delta = 0$ the time to convergence scales logarithmically as $O(\log N)$. Thus, they achieve an exponential gap between de novo generation via random sampling and their fusion process. This separation becomes double exponential in the case of $\delta = 0$. Since we are not satisfied with life arising as an exponentially unreasonable event, we can argue that protocell fusion could have been the process that underlies the emergence of evolvable cells. Thus, sex is essential and foundational to evolution.

### Prior mathematical modeling & the smooth fitness landscape assumption

The most exciting case for me is the huge speed up in the case of $\delta = 0$. Although Sinai et al. (2016) don’t discuss the connection, their model and analysis in this case is actually equivalent to the classic work of Wilf & Ewens (2010).[4] However, Wilf & Ewens (2010) did not think of their mechanism as sexual — they actually explicitly avoid using ‘alleles’ for gene variants to avoid potential confusion — but as typical mutation. In their model, at each time step, one of the genes (or molecules or whatever) that has not yet arisen can arise with some probability. This is just mutation in an asexual population on a Mt. Fuji fitness landscape with elitist dynamics.

This smooth fitness landscape assumption is essential for both Wilf & Ewens (2010) and the more general analysis by Sinai et al. (2016). As I’ve shown before, in rugged fitness landscapes like the NK-model (Kauffman & Levin, 1987)[5] even finding a local optimum (nevermind the global one they are hunting for) can be impossible in polynomial time (Kaznatcheev, 2013). And in this case, I don’t think such landscapes are just a more complicated special case, but reveal an important feature of early chemical interaction networks.

To get rugged fitness landscapes, we need to have epistatic interactions in our model. There are a couple of ways to think about epistasis in the Sinai et al. (2016) model. First is to consider interactions like: if molecule A is added to a protocell that already contains molecule B then molecule A degrades molecule B. This would be equivalent to f(Ab) > f(aB) > (f(AB), f(ab)) epistasis where lower case letters mean the absence of the molecule. Second is to consider interactions like: if molecule A or B are added to a protocell on their own then they degrade, but if the are added together than the maintain each other. This would be equivalent to reciprocal sign epistasis with: f(AB) > f(ab) > (f(Ab), f(aB)). In the figure below, I consider examples of how to build up fitness landscapes from chemical interaction networks.

Chemical interaction networks for molecular epistasis. In this figure, I translate between chemical interaction networks implicit in Sinai et al. (2016) and the fitness landscapes implicit in Wilf & Ewens (2010). In the top panels, we see an example protocell containing two molecules A and B. The interaction of the molecules is given by feedback lines. In the top center panel, for example, we see that A is self-inhibited (or limited by an external process) and that this self-inhibition is itself inhibited by the presence of B. In the bottom panels, we see the equivalent fitness landscapes where upper case letters correspond to the presence of the molecule in the protocell and the lower case letters represent the absence of the corresponding molecule. Both Wilf & Ewens (2010) and Sinai et al. (2016) only consider the non-epistatic (i.e. non-interacting) cases on the left-most panels. However, if we want to model enzymic parasitism then we should consider the center and right panels. From just the sign epistasis of the center panels we can build up semi-smooth landscapes of Kaznatcheev (2013) and from both the center and right panels we can build the NK-model of rugged fitness landscapes (Kauffman & Levin, 1987).

Although Sinai et al. (2016) don’t consider interacting molecules like the above, I think they are the most interesting cases. The reason we are trying to collect all N components together is because we think that in combination they are so reactive that they will implement primordial metabolism or force the reproduction of the protocell.[2] But then why would the molecules be completely independent of each other and unreactive when a subset is assembled? Of course, all subsets could interact positively such that adding an extra component always makes the system just as durable or more. However, from the perspective of the emergence of higher levels of organization — or ‘evo-ego’ in the terminology of Watson & Szathmary (2016) — the interesting cases are those where the interests of the sub-components aren’t already aligned. I think these epistatic interactions between components need to be considered if we want to have a full response to the problem of parasitism of cooperative enzymes (see Hogeweg & Takeuchi, 2003; Bianconi et al., 2013; Bansho et al., 2016) that Sinai et al. (2016) engage with.

### Differences between sex and fusion

Finally, returning to a central tenet of Sinai et al. (2016): the identification of protocell fusion and ‘sex’. Although fusion and sex do have some conceptual overlap, fusion lacks the essential parts of both (1) separation/recombination and the (2) joining of successful lineages.

In their discussion, Sinai et al. (2016) write:

the expected time to reach the target set of N components is reduced if cells divide (and retain some components) instead of losing all components through death.

This is true if the cell division is as infrequent as death is in their model. For example, if cells divide with probability $\delta$ and fuse with probability $1 - \delta$. However, for sexual organisms, the child does not simply get the genomes of both parents — as happens with fusion. Instead, the genomes of the parents are recombined at random, and so the child gets a (strict, for non-identical parents) subset of the parents’ combined genome. In the Sinai et al. (2016) model this would be a ‘fusion’ that is instantly followed by a ‘division’. But in that scenario, their arguments for speed-up no longer holds. The ‘problem of sex’ (Barton & Charlesworth, 1998) that Livnat (2013) and others tackle is that it is difficult to off-set the tendency of sex to break up good combinations of genes (or acquired molecules). Sinai et al. (2016) only allow fusion to expand good combinations. We would need to extend their work to handle cases of frequent reshuffling to turn fusion into sex and directly address the problem of sex.

A way forward with this is to recognize one of the greatest benefits of sex: combining information from separately evolved lineages. Instead of tracking a single accumulator and adding a freshly generated protocell to it during fusion, we could track a number of accumulator lineages and allow two established accumulators to merge.

Without this merging of established accumulators, the Sinai et al. (2016) model cannot be separated from an asexual variant. I will sketch such a variant of the Sinai et al. (2016) model. Consider an accumulator with K out of N parts already in place. At each time-step, it either dies with probability $\delta$ or if it survives (with probability $1 - \delta$) then with probability p it absorbs a single other molecule. Maybe the molecule is able to diffuse across the protocell membrance or some such mechanism. This molecule with probability 1 – K/N is new to the accumulator and thus increases the accumulator to state K + 1. This replaces the step of merging with another brand new accumulator used by Sinai et al. (2016) that on average gave about pN opportunities that each increase the accumulator with probability 1 – K/N. I don’t think that any biologist would read my model of a single molecule defusing across a membrane as ‘sexual’. In particular, it is equivalent to the classical model of genes being fixed one after the other in an asexual clonal population. However, in the case of $\delta = 0$ my model would converge to all N parts in time roughly $O(N \log N)$, giving the same huge speed up over the exponential search of blind luck. Further, the factor of N slowdown compared to the Wilf & Ewens (2010) and Sinai et al. (2016) model is not due to some deep effect but simply that fusing with a new accumulator gives pN opportunities for new component per time step instead of just 1 in the model that I described. So it isn’t surprising to see a roughly factor of N slowdown.[6] To say that sex gives a speed up over asex (and to thus argue that it is essential for starting evolution), we would need to separate our model of sex from the absorption of one part model sketched above. However, I am sure this is possible if we consider the case of merging already established accumulators. The difficulty, of course, hides in the detailed mathematical analysis.

### Notes and References

1. The problem of cooperation follows a similar structure. We assume that selfishness is the default state and that leaves the evolution of cooperation as the thing to be explained. But should we turn this question on its head and instead ask about the evolution of selfishness? One of the things that makes reading Kropotkin’s Mutual Aid — and apparently other Russian naturalists from that time, although I need to read more on this — so refreshing is that he doesn’t take selfishness as the starting point; he just marvels at the cooperative nature of the world and slowly explores how we’ve become more selfish. There are probably deep connections between why selfishness feels like the clear default and the naturalization of capitalism and ideology of (neo)liberalism, but this footnote can only hold so much. I’ll leave that exploration for future posts.
2. This approach of considering a period of protocells that are incapable of primarily vertical gene transfer and must rely on horizontal transfer is akin to Carl Woese’s genetic annealing model (Woese, 1998) and the ‘Darwinian thershold’ (Woese, 2002) where the annealing switches from horizonal to vertical gene transfer. Like Sinai et al. (2016), Woese (2002) aims to show that initially cellular evolution was primarily driven by an analog of sex (horizontal gene transfer) and only after gaining sufficient complexity, was evolution able to switch to the vertical gene transfer associated that characterizes asexual reproduction. Although Sinai et al. (2016) cite the classic Woese (1967) book on the genetic code, they do not go into an explicit discussion of his work on this transition from horizontal to vertical gene transfer.
3. For an introduction to protocells, see Eric Bolo’s review of the Bianconi et al. (2013) work on abiogenesis; earlier protocell work that Nowak was also involved with. For a quick overview of Albert Libchaber’s work on experimentally creating artificial protocells, see my old post on the algorithmic view of historicity and separation of scales in biology.
4. It is nice to see that both Wilf & Ewens (2010) and Sinai et al. (2016) acknowledge their debt to the analysis of algorithms. Wilf & Ewens (2010) reference the analysis of radix-exchange sorting including Knuth’s (1973) textbook analysis of it. Sinai et al. (2016) reference the analysis of skip lists (Pugh, 1990). Both are topics that computer science students would typically encounter in a course on probabilistic algorithms. This gives me even more hope for what theoretical computer science can offer biology, and I hope that cstheory professors will start to incorporate such examples from the foundational theory of evolutionary biology in future courses.
5. In fact, the landscape doesn’t even need to be that rigid. Sinai et al. (2016) use a rule similar to ‘random fitter mutant’, and so my analysis of semi-smooth fitness landscapes and the simplex algorithm (Kaznatcheev, 2013) also applies. This has the added benefit of being a result that doesn’t need the assumption of PLS being hard. Although since the Sinai et al. (2016) model considers only adaptive steps, even the general hardness result doesn’t need PLS being hard because we know that we can construct instances of weighted 2-SAT where any adaptive path to any local optimum is exponentially long from a random start.
6. I expect similar polynomial scaling to Sinai et al. (2016) for my model in the case of $0 < \delta < 1$ but I did not have either the space or time to verify this for this post. Of course, there would be roughly a $(1 - \delta)^{pN}$ (approximately $(1 + \delta)pN$ for small $\delta$ and p) slow down due to less frequent sampling in my model (1 versus pN per time-step). But we would also expect a much lower death rate $\delta$ in my model since, for me, $\frac{1}{\delta}$ measures the approximate time to encountering a new molecule while for Sinai et al. (2016) $\frac{1}{\delta}$ measures the approximate time to encountering another proto-cell full of approximately pN molecules.

Bansho, Y., Furubayashi, T., Ichihashi, N., & Yomo, T. (2016). Host–parasite oscillation dynamics and evolution in a compartmentalized RNA replication system. Proceedings of the National Academy of Sciences, 201524404.

Barton, N. H., & Charlesworth, B. (1998). Why sex and recombination? Science, 281(5385): 1986-1990.

Bianconi, G., Zhao, K., Chen, I.A., & Nowak, M.A. (2013). Selection for replicases in protocells. PLoS Computational Biology, 9(5).

Brosius, J. (2003). Gene duplication and other evolutionary strategies: from the RNA world to the future. Journal of Structural and Functional Genomics, 3(1-4): 1-17.

Brosius, J. (2005). Echoes from the past–are we still in an RNP world? Cytogenetic and Genome Research, 110(1-4): 8-24.

Heard, E., Tishkoff, S., Todd, J. A., Vidal, M., Wagner, G. P., Wang, J., Weigel, D., & Young, R. (2010). Ten years of genetics and genomics: what have we achieved and where are we heading? Nature Reviews Genetics, 11(10): 723-733.

Hogeweg, P., & Takeuchi, N. (2003). Multilevel selection in models of prebiotic evolution: compartments and spatial self-organization. Origins of Life and Evolution of the Biosphere, 33(4-5), 375-403.

Kauffman, S., & Levin, S. (1987). Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology, 128(1): 11-45

Kaznatcheev, A. (2013). Complexity of evolutionary equilibria in static fitness landscapes. arXiv preprint: 1308.5094.

Knuth, D.E. (1973). The Art of Computer Programming (Vol 3): Sorting and Searching. Addison-Wesley.

Livnat, A. (2013). Interaction-based evolution: how natural selection and nonrandom mutation worktogether. Biology Direct, 8(1): 1.

Livnat, A. & Papadimitriou, C. (2016). Evolution and learning: used together, fused together. A response to Watson and Szathmary. Trends in Ecology & Evolution, 31(12): 894-896.

Pugh, W. (1990). Skip lists: a probabilistic alternative to balanced trees. Communications of the ACM, 33(6): 668-676.

Sinai, S, Olejarz, J, Neagu, IA, & Nowak, MA (2016). Primordial Sex Facilitates the Emergence of Evolution arXiv arXiv: 1612.00825v1

Stebbins, G. L. (1957). Self fertilization and population variability in the higher plants. The American Naturalist, 91(861): 337-354.

Van Valen, L. (1975). Group selection, sex, and fossils. Evolution, 87-94.

Vrijenhoek, R. C. (1989). Genetic and ecological constraints on the origins and establishment of unisexual vertebrates. Evolution and Ecology of Unisexual Vertebrates, 466, 24-31.

Watson, R.A. & Szathmary, E. (2016). How can evolution learn? Trends in Ecology & Evolution, 31(2): 147-157.

West, S. A., Lively, C. M., & Read, A. F. (1999). A pluralist approach to sex and recombination. Journal of Evolutionary Biology, 12(6): 1003-1012.

Williams, G. S. (1966; 8th edition 1996). Adaptation and Natural Selection. Princeton: Princeton University Press.

Wilf, H. S., & Ewens, W. J. (2010). There’s plenty of time for evolution. Proceedings of the National Academy of Sciences, 107(52): 22454-22456.

Woese, C. R. (1967). The Genetic Code. New York: Harper and Row.

Woese, C. R. (1998). The universal ancestor. Proceedings of the National Academy of Sciences, 95: 6854-6859.

Woese, C. R. (2002). On the evolution of cells. Proceedings of the National Academy of Sciences, 99(13): 8742-8747.

From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

### 11 Responses to Fusion and sex in protocells & the start of evolution

1. Andriy Marusyk says:

What is the difference between red and green arrows in the figure (lower panel)?

• No big difference between them. I just wanted to emphasize that they point in the other direction to the green ones. However, if you were imagining all-capitals as the fittest genotype and not too much epistasis (say there is a single peak, like for a semi-smooth landscape) then the red segments would be going downhill over a predominantly green uphill trend.

By the way, Andriy, I think you might enjoy Livnat’s paper that I reference in the article:

Livnat, A. (2013). Interaction-based evolution: how natural selection and nonrandom mutation worktogether. Biology Direct, 8(1): 1.

It is a bit (actually, way too) lengthy, but if you get a chance to glance at parts of it then I’d love to chat to you about your impressions.

• Andriy Marusyk says:

Thanks for the reply and pointing to the reference! I am highly interested in the subject of cell fusions and sexual recombination, as we are playing with these in context of cancer evolution. As for your ideas regarding extending the study I thin you are making valid points. I find it difficult to imagine how things could work in reality one way or another. Even normal metabolism is impervious to my intuition (as things are more transient and everything is about fluxes), and things are much more murkier in primordial protocells. Still, your idea of rugged landscapes and interacting subnetworks intuitively makes a lot of sense. Sub-networks (perhaps transient) rather than individual metabolites are probably the functional units required for the assembly of more complex self-replicating networks; this probably leads to different consequences regarding time and probability. Too bad there is no feasible way to test these things experimentally.

• Good points Andriy, does this mean that this idea will remain theoretical for the time being? I am guessing it might be possible to engineer something in vitro but it might be difficult to argue that this would be a lot less artificial than a math model.

• Andriy Marusyk says:

My point was more about applicability of framework of selection toward metabolites and chemical reactions. Lots of good food for thought regarding to evolution of sexual recombination in Artem’s and cited arguments, but I cannot imaging how this logic will work for metabolic networks.

• I was thinking of you and your talk on cancer cell fusion/un-fusion and phoenix cells while I was writing the post. But I couldn’t find a real way to connect to cancer in this post. Maybe in some future follow up.

My original excitement about your phoenix cells talk back in February was in how it relates to Julian Xue’s work on supply driven evolution:

Xue, J.Z., Costopoulos, A., & Guichard, F. (2015a). A Trait-based framework for mutation bias as a driver of long-term evolutionary trends. Complexity.

Xue, J.Z., Kaznatcheev, A., Costopoulos, A., & Guichard, F. (2015b). Fidelity drive: A mechanism for chaperone proteins to maintain stable mutation rates in prokaryotes over evolutionary time. Journal of Theoretical Biology, 364: 162-167.

In particular, how the mechanism you described in cancer cells seems to greatly lengthen the tail distribution of fitnesses for cells. The fused cancer cell has the benefit of storing much richer genetic variety in it (good for evolvability) but when it decides to divide, it has to unscramble that two genomes of its parent cells into daughter cells. Most cases do this unscrambling poorly and die (i.e. body/median of the fitness distribution is non-viable) but a few survive with either advantages mutations or with the fusion state having allowed them to survive therapy stress (i.e. a few very high-fitness cells in the tail).

If somebody wanted to extend Sinai et al. (2016) to capture sex by modelling un-fusions (as I describe), then this is exactly the sort of puzzle they would be up against. So maybe that is a more direct connection to your work on phoenix cells?

As for experiments, I agree that it looks like we have no or little hope in the abiogenesis case. But I am more optimistic for an experimental system existing for studying the effects of fusion on the evolvability of cancer cells. Would you agree with my optimism? Or is it also an unreachable experimental system in cancer? If you agree then it’d be another great opportunity for using cancer to understand fundamental biology and thus something I could tease Jacob Scott with :D.

• andriy marusyk says:

Hi Artem, thank you for the additional links! Still going through Livnat one, which is very interesting and thought provocative. Selection for network interactions rather than individual alleles makes lots of sense. As for the cell fusions in cancers, we still have to address experimentally whether recombination happens. If it does, it opens up a lot of interesting questions.

• Yeah, I got confused by the choice of colors as well. I am used to green= go and red=stop.

• Yes, that is what I meant in my clarification for Andriy:

if you were imagining all-capitals as the fittest genotype and not too much epistasis … then the red segments would be going downhill over a predominantly green uphill trend.

In other words, if you were trying to get from aa to AB (which we suppose is higher fitness) and could only follow fitness increasing paths then you could follow a green arrow (it points in the direction you want to go), but a red arrow would stop you.

Sorry for not being sufficiently clear on the point.

This site uses Akismet to reduce spam. Learn how your comment data is processed.