Preservation of duplicate genes by complementary, degenerative mutations

For reading group 22 (January 19th), Julian presented the title paper, and I decided to summarize the main findings as a non-biologists:

Force A, Lynch M, Pickett FB, Amores A, Yan YL, & Postlethwait J (1999). Preservation of duplicate genes by complementary, degenerative mutations. Genetics, 151 (4), 1531-45 PMID: 10101175

When looking closely at genomics, biologists go beyond the simple abstractions evolutionary game theorists use. In particular, instead of associating genotypes with a simple label and allowing arbitrary mutations from label to label, a biologist actually tries to think about the underlying mechanism. This leads to questions we would never consider in pure EGT, such as gene duplication. For our reading group, gene duplication is of interest for two reasons:

  1. Most new functions come out of gene duplication and new functions are essential if we want to talk about the evolution of complexity
  2. The mechanism behind gene-duplication is also one of the standard mechanisms for generating scale-free networks and thus the gene-interaction networks are scale free

The paper needs to make some simplifying assumptions to start. The two crucial ones, are:

  1. Having two copies of the gene does not increase the output of corresponding proteins, and
  2. Having redundant copies does not hurt the organism at all, and thus they redundancy can drift to fixation

With these assumptions we can generate the classical model. In the classical model, there is redundancy because a second copy shields the original function from deleterious mutations (which are much more common than beneficial ones). Unfortunately, this predicts a much faster rate of degeneracy of duplicates than seen in the real world.

The goal of Force et al. is to present a new conceptual framework for understanding the evolution of duplicate genes and why duplicates are preserved in higher quantities than the classical model. The key of the duplication-degeneration-complementation (DDC) model is how we represent a gene. We think of the gene as having two parts:

  1. A coding part that stores most of the gene. A hit to this region completely destroys the gene and leads to the death of the organism
  2. A regulatory part that stores many subfunctions. A hit to a sub-function makes the code inactive for that subfunction on that gene. If a specific subfunction is knocked out for every copy of the gene, then the organism dies. The intuitive way to think about subfunctions is as triggers that activate the coding section; among all the gene copies your want to main at least one of each type of trigger

This second point is where the magic is, and where the DDC model differs from the standard one. The above gene representation allows for three (surprise!) things to happen to genes after they have been duplicated:

  • Nonfunctionalization: a copy experiences a null mutation that destroys the original function of the gene. An example would be a hit to the coding region.
  • Neofunctionalization: a copy experiences a positive mutation that creates a new function which fixes in the population by selection. Since negate mutations are much more likely, we assume that this functionalization almost never happens.
  • Subfunctionalization: both copies lose parts of their regulatory mechanisms, such that neither has sufficient regulatory mechanism to activate the gene. However, together the retain enough regulatory mechanisms to activate the gene.

Subfunctionalization locks the two copies in by complementary, degenerate mutations. If the coding region of either gene takes a hit, the organism dies. Therefore, in the population both copies will have to continue to be expressed. This mechanism makes it much harder for genes to degrade and allows us to maintain the higher amounts of gene duplicates observed in nature. The logical end to subfunctionalization (which is not discussed in the paper) is when a gene duplicated and undergoes subfunctionalization to the point that every copy only has one subfunction remaining in the regulatory section. This prediction of the model, unfortunately, is not observed in nature.