Algorithmic information theory (AIT) allows us to study the inherent structure of objects, and qualify some as ‘random’ without reference to a generating distribution. The theory originated when Ray Solomonoff (1960), Andrey Kolmogorov (1965), and Gregory Chaitin (1966) looked at probability, statistics, and information through the algorithmic lens. Now the theory has become a central part of theoretical computer science, and a tool with which we can approach other disciplines. Chaitin uses it to formalize biology.
In 2009, he originated the new field of metabiology, a computation theoretic approach to evolution (Chaitin, 2009). Two months ago, Chaitin published his introduction and defense of the budding field: Proving Darwin: Making Biology Mathematical. His goal is to distill the essence of evolution, formalize it, and provide a mathematical proof that it ‘works’. I am very sympathetic to this goal.
Chaitin’s conviction that evolution can be formalized stems from his deeply Platonic view of the world. Since evolution is so beautiful and ubiquitous, there must be a pure perfect form of it. We have to look past the unnecessary details and extract the essence. For Chaitin this means ignoring everything except the genetic code. The physical form of the organisms is merely a vessel and tool for translating genes into fitness. Although this approach might seem frightening at first, it is not foreign to biologists; Chaitin is simply taking the gene-centered view of evolution.
As for the genetic code, Chaitin takes the second word very seriously; genes are simply software to be transformed into fitness by the hardware that is the physics of the organisms’ environment. The only things that inhabit this formal world are self-delimiting programs — a technical term from AIT meaning that no extension of a valid program is valid. This allows us to define a probability measure over programs written as finite binary strings, which will be necessary in the technical section.
Physics simply runs the programs. If the program halts then the natural number it outputs is the program’s fitness. In other words, we have a perfectly static environment. If you were interested ecology or evolutionary game theory, then Chaitin just threw you out with the bath water. If you were interested in modeling, and wanted to have something computable define your fitness, then tough luck. Finally, in a fundamental biological theory, I would expect fitness to be something we measure when looking at the organisms, not a fundamental quantity inherent in the model. In biology, a creature simply reproduces or doesn’t, survives or doesn’t; fitness is something the observer defines when reasoning about the organisms. Why does Chaitin not derive fitness from more fundamental properties like reproduction and survival?
In Chaitin’s approach there is no reproduction, there is only one organism mutating through time. If you are interested in population biology, or speciation then you can’t look at them in this model. The mutations are not point-mutations, but what Chaitin calls algorithmic mutations. The algorithmic mutation actually combine the act of mutating and selecting into one step, it is a -bit program that takes the current organism A and outputs a new organism B of higher fitness (note, that it needs an oracle call for the Halting-problem to do so). The probability that A is replaced by B is then . There is no way to decouple the selection step from the mutation step in an algorithmic mutation, although this is not clear without the technical details which I will postpone until a future post. Chaitin’s model does not have random mutations, it has randomized directed mutations. Fitness as a basic assumption, static environment, and directed mutations make this a teleological model — a biologist’s nightmare.
What does Chaitin achieve? His primary result is to show biological creativity, which in this model means a constant (and fast) increase in fitness. His secondary result is to delineate between three types of design: blind search, evolution, and intelligent design. He shows that to arrive at an organism that has the maximum fitness of any -bit organism (this is the number — the th busy beaver number), blind search required on the order of steps, evolution requires between and , and intelligent design (that selects the best algorithmic mutation at each step) requires steps. These are interesting questions, but what do they have to do with Darwin?
Does Chaitin prove Darwin?
We are finally at the central question of this post. To answer this, we need to understand what Darwin achieved. The best approach is to look at Mayr’s (1982) five facts and three inferences that define Darwin’s natural selection:
- Fact 1: Population increases exponentially if all agents got to reproduce.
Metabiology: A single agent that doesn’t reproduce
- Fact 2: Population is stable except for occasional fluctuations.
Metabiology: There is always one agent, thus stable
- Fact 3: Resources are limited and relatively constant.
Metabiology: Resources are not defined.
- Inference 1: There is a fierce competition for survival with only a small fraction of the progeny of each generation making it to the next.
Metabiology: Every successful mutation makes it to the next generation.
- Fact 4: No two agents are exactly the same.
Metabiology: There is only one agent.
- Fact 5: Much of this variation is heritable.
Metabiology: Nothing is heritable, a new mutant has nothing to do with the previous agent except having a higher fitness.
- Inference 2: Survival depends in part on the heredity of the agent.
Metabiology: A mutant is created/survives only if more fit than the focal agent.
- Inference 3: Over generations this produces continual gradual change
Metabiology: Agent constantly improves in fitness
The only thing to add to the above list is the method for generation variation: random mutation. As we saw before, metabiology uses directed mutation. From the above, it mostly seems like Chaitin and Darwin were concerned about different things. Chaitin doesn’t prove Darwin.
However, I don’t think Chaitin’s exercise was fruitless. I think it is important to try to formalize the basic essence of evolution, and to prove theorems about it. However, I think Chaitin needs to remember what made his development of algorithmic information theory so successful. AIT was able to address existing questions of interest in novel ways. So the lesson of this post is to concentrate on the questions biologists want to answer (or have answered already) when building a formal model. Make sure that your formal approach can at least express some of the questions a biologist would want to ask.
Chaitin, G. (1966). On the Length of Programs for Computing Finite Binary Sequences. J. Association for Computing Machinery 13(4): 547–569.
Chaitin, G. (2009). Evolution of Mutating Software EATCS Bulletin, 97, 157-164
Kolmogorov, A. (1965). Three approaches to the definition of the quantity of information. Problems of Information Transmission 1: 3–11
Mayr, E. (1982). The Growth of Biological Thought. Harvard University Press. ISBN 0-674-36446-5
Solomonoff, R. (1960). A Preliminary Report on a General Theory of Inductive Inference. Technical Report ZTB-138, Zator Company, Cambridge, Mass.