Programming playground: A whole-cell computational model

Three days ago, Jonathan R. Karr, Jayodita C. Sanghvi and coauthors in Markus W. Covert’s lab published a whole-cell computational model of the life cycle of the human pathogen Mycoplasma genitalium. This is the first model of its kind: they track all biological processes such as DNA replication, RNA transcription and regulation, protein synthesis, metabolism and cell division at the molecular level. To achieve this, the authors integrate 28 different sub-models of the known cellular processes.

Figure 1A from Karr, Sanghvi et al. (2012)

Figure 1A from Karr, Sanghvi et. al (2012): A diagram of the 28 sub-models, colored by category: RNA (green), protein (blue), metabolic (orange), DNA (red). The modules are connected by arrow representing common metabolites (orange), RNA (green), proteins (blue), and DNA (red).

The key technical accomplishment was integrating the 28 modulus into a single model. Each module is based on existing models, but different modules are expressed in different paradigms: ODE, Boolean, probabilistic, and constraint-based. For me, this is the most impressive aspect of the work. Usually, when I look at biology (or psychology), I see a mishmash of models with each expressed in its own language and seemingly incompatible with the others. The authors overcame this by assuming the modulus are independent on short timescales (under 1 second). This allows the software to keep track of 16 global cell variables which are used as inputs for the submodulus that are run to simulate 1 second and their results used to update the global variables and repeat the loop. The whole software is available online and the authors can use the data gathered to produce a video of a single cell’s life cycle:

The authors show that the model has a high level of agreement with existing data. They also use the predictions to run several novel real-biology experiments, and even partially overturn (or complete) a previous experimental observation based on hints from their model. In particular they show that disruption of the IpdA gene — which Glass et al. (2006) suggested as non-essential — has severe (but noncritical) impact on cell growth. I wish I could comment more on the validity of the model as judged by experiments, but molecular biology is magic to me.

The simulation results that were most exciting for me was looking at the effects of single-gene disruptions on phenotype. The bacterium Mycoplasma genitalium is a human urogenital parasite whose genome contains 525 genes (Fraser et al., 1995). It is not an easy model organism to work with, but it has the smallest known genome that can constitute a cell. Part of the team on this project, is from J. Craig Venter Institute and has extensive experience with the organism due to their effort to create the first self-replication synthetic life by implanting artificial DNA into Mycoplasma genitalium. I would not be surprised if this model plays a vital part in the institute’s engineering.

Karr, Sanghvi et al. (2012) ran simulations of each of the 525 possible single-gene disruption strains. They found that 284 genes were essential to sustain growth and division and 117 are non-essential — a 79% agreement with the experimental results of Glass et al. (2006). Of particular interest for me was that in some cases it took more than one generation for specific proteins to fall to lethal levels. As far as I understand this is because when a single-cell divides, daughters get both a copy of the mother DNA and have their initial levels of proteins and RNA set to within statistical fluctuations of those of their mother. Due to my complete lack of basic biological background, this seemed an interesting example of Lamarkian evolution. In particular, it raises questions on how to best combine single-cell learning and evolution. From a naive Bayesian model of learning, it would seem that this would allow cells to pass on their priors — a biological evolution counterpart to Beppu & Griffiths (2009) cultural ratchet.

The detail of the whole-cell model is impressive. I hope that the software becomes a tool for theorists without access to a wet-lab to play around with cells. The approach is an antithesis to the simple and completely unrealistic models I am accustomed to building. For me, it raises many thoughts on how to better think about the distinction between genotype and phenotype that is almost always ignored in evolutionary game theory. For now the whole-cell model is computationally too expensive for me to build evolutionary dynamics from it, but maybe parts of the code can be simplified or ignored or maybe we could use more course-grained models. Either way, I am excited for my new playground!


Beppu, A., & Griffiths, T. (2009). Iterated learning and the cultural ratchet. Proceedings of the 31st Annual Conference of the Cognitive Science Society, 2089-2094.

Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A., Fleischmann, R.D., Bult, C.J., Kerlavage, A.R., Sutton, G., Kelley, J.M., et al. (1995). The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403.

Glass, J.I., Assad-Garcia, N., Alperovich, N., Yooseph, S., Lewis, M.R., Maruf, M., Hutchison, C.A., Smith, H.O., & Venter, J.C. (2006). Essential genes of a minimal bacterium. Proc. Natl. Acad. Sci. USA 103, 425–430.

Karr, J.R., Sanghvi, J.C., Macklin, D.N., Gutschow, M.V., Jacobs, J.M., Bolival, B., Assad-Garcia, N., Glass, J.I., & Covert, M.W. (2012). A whole-cell computational model predicts phenotype from genotype Cell, 150, 389-401 DOI: 10.1016/j.cell.2012.05.044

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

13 Responses to Programming playground: A whole-cell computational model

  1. Jim Birch says:

    It took a while but we are finally on the cusp of real molecular biology…

  2. Pingback: Programming playground: A whole-cell computational model | Social Foraging |

  3. jonsca says:

    I’m not sure it’s logical to do all of the numerical integration on the same timescale. This would assume a central oscillator dictating the pace in the cell and may be a fallacy, as I’m sure that many of the biochemical reactions are occurring asynchronously.

    • This is a very good point. However, I don’t think they meant to imply that there is a central synchronizing clock, they just did it for the sake of simplicity (otherwise they wouldn’t be integrating sub-modules into one computational model, but would be building a whole new computational model). One of the issues that they really didn’t make clear (and I should just button-down and look at their code) is how they decide the new input values at the next time step if several sub-modules returned different values for the same macro-molecules on the previous time step.

      They could clean up this issue a little bit by running each sub-process for some time picked uniformly at random from 0 to 2 seconds. This would break the syncing of the processes. However, it would make updating global macro-molecule values even more difficult, because the initial macro-molecule values of one sub-process could be changed by another sub-process while the first one is still running. It all comes down to haw they resolve conflicts in return values from the interacted sub-modules.

      As for the arbitrary value of 1 second for clock in their model or for the mean in my model; it would be nice to have some real justification. For instance, the model assumes that the macro-molecules are uniformly distributed through the cell. This is probably a reasonable assumption, but not at all timescales: what is the mixing time in a cell? (Maybe I should ask on bio.SE) How long do we have to wait after some local disturbance before we can assume that the macro-moleculues are uniformly-ish distributed again. Any time scale we pick has no to be an order of magnitude above this mixing time to make the uniform distribution assumption reasonable. You could probably work around this too, though just by adding some noise.

      Oh noise… it is such a great way to hide ignorance :D.

  4. Pingback: The Geek’s Reading List – Week of August 24, 2012 | thegeeksreadinglist

  5. Pingback: EGT Reading Group 31 – 35 « Theory, Evolution, and Games Group

  6. Pingback: Some stats on the first 50 posts « Theory, Evolution, and Games Group

  7. Pingback: Programming playground: Cells as (quantum) computers? | Theory, Evolution, and Games Group

  8. Pingback: Stats 101: an update on readership | Theory, Evolution, and Games Group

  9. Pingback: Simplifying models of stem-cell dynamics in chronic myeloid leukemia | Theory, Evolution, and Games Group

  10. Pingback: Three goals for computational models | Theory, Evolution, and Games Group

  11. Pingback: Cataloging a year of blogging: the algorithmic world | Theory, Evolution, and Games Group

  12. Pingback: Why academics should blog and an update on readership | Theory, Evolution, and Games Group

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s