## Computer science on prediction and the edge of chaos

With the development of statistical mechanics, physicists became the first agent-based modellers. Since the scientists of the 19th century didn’t have super-computers, they couldn’t succumb to the curse of computing and had to come up with analytic treatments of their “agent-based models”. These analytic treatments were often not rigorous, and only a heuristic correspondence was established between the dynamics of macro-variables and the underlying microdynamical implementation. Right before lunch on the second day of the Natural Algorithms and the Sciences workshop, Joel Lebowitz sketched how — for some models — mathematical physicists still continue their quest to rigorously show that macrodynamics fatefully reproduce the aggregate behavior of the microstates. In this way, they continue to ask the question: “when can we trust our analytic theory and when do we have to simulate the agents?”

What are the fundamental constraints on prediction? Theoretical computer scientists are probably the best equipped to answer this question in the general case. The day before Lebowitz, Cris Moore held the position of standing between the workshop participants and their lunch. He used this opportunity to explain how physics, the problem of prediction, and computational complexity relate. In computational complexity, we study what sort of problems can be solved with a limited resource such as time, space, communication, or even randomness. To start us off on the road to prediction vs. simulation, Moore considered the question of parallelization vs. serial processing ($\mathsf{NC}$ vs. $\mathsf{P}$), and finding vs. verifying an answer ($\mathsf{P}$ vs. $\mathsf{NP}$). He urged us to explore the problems in the gap between $\mathsf{NC}$ and $\mathsf{P}\mathrm{-complete}$, and $\mathsf{P}$ and $\mathsf{NP}\mathrm{-complete}$. These are the messy problems, not structured enough to encode computers, but not simple enough to fall in the lower complexity classes.

In the last talk of day one, Mark Braverman considered two more obstacles to prediction: chaos and universality. Chaos theory is a well developed field of complexology that is familiar to physicists and biologists, and its limits of prediction rely on the high variance in behavior between paths with similar initial conditions. To exactly describe the behavior of a chaotic system over any significant time-horizon, the initial conditions have to be extremely well measured. In computer science terms: the initial measurement needs an exponential amount of precision in the time-horizon to which we want to predict the system. This is why we can’t predict the exact weather in a specific location to a high degree of accuracy more than a few days in the future. However, we are seldom interested in this level of precision from our models, usually we mean something else by prediction. A typical prediction might be: if we keep pumping greenhouses gasses into the atmosphere then in 100 years it will be on average far warmer on the planet. Chaotic systems do not place any fundamental constraints on such qualitative predictions, in fact they often have easy to describe long-term qualitative behavior. Universal systems pose more of a challenge.

Rice’s theorem states that for any non-trivial property (i.e. one that isn’t constant on all languages), there is no algorithm that decides if the language of a given Turning Machine satisfies that property. For natural algorithms, this means that if your model is universal (there is a way to use the initial conditions and parameters of your model to encode an arbitrary Turing Machine) then, in general, even qualitative statements about the long-term behavior of your system are impossible. If you are a modeler, and you built a model that is universal then you’ve simply built in too much freedom, you can recreate anything and predict nothing. As with Moore, there are problems in the gap: undecidable models that are too unstructured to encode arbitrary Turing machines and thus not universal, but still not restricted enough to be predictable.

Universality of models is not uncommon, Bernad Chazelle‘s influence systems from the opening remarks can display both chaos and universality. In Chazelle’s model of diffusive systems, agents keep interacting with each other — contributing to the entropy of the system — and they keep trying to move towards their neighbors — contributing to the energy. If the parameters are such that entropy and energy balance perfectly then you have a criticality and features like chaos and Turing universality. However, this perfect entropy-energy balance is a knife-edge phenomena, small-disturbances tip you one way or another and the system becomes non-universal and potentially predictable. The set of parameter settings for which influence systems are unpredictable has measure zero, and a very complicated self-similar structure.

Braverman’s example had similar behavior. He considered if Julia sets — the boundary of points $z$ such that the map $z \rightarrow z^2 + c$ doesn’t grow to infinity — could be computed given a description of $c$. He found that although some Julia sets are not computable, this undecidability is not robust, applying a small noise to $c$ results in Julia sets that are computable. Like in influence systems, the undecidable Julia sets are knife-edge but a beautifully structured knife: they live on the boundary of the Mandelbrot set.

In other words, if — instead of doing worst-case analysis — we look at the smoothed analysis of influence systems and Julia sets then they are predictable. It is a matter of allowing small random perturbations around the worst-case inputs: combining the best of worst-case and average-case analysis. Does this mean we can bring back statistical mechanics? The biggest intersection between statistical mechanics and traditional theoretical computer science is in looking at phase-transitions in the $\mathsf{NP}\mathrm{-complete}$ problem 3SAT. If we defined the density $\alpha$ of a 3SAT instance as the ratio of number of clauses to number of variables then as we increase $\alpha$ we transition from a state where most instances are satisfiable to one where almost none are. Near the critical values $\alpha_c$, we see the 3SAT instances that are hardest to solve, another edge of chaos. The study of this phase transition can be incredibly rewarding, but as Moore pointed out, it cannot provide us with fundamental insights into the difficulty of prediction. The 3XOR problem — a problem that easily solvable in polynomial time since it is just linear equations over $\mathbb{Z}_2$ — has the same exact statistical structure. Although statistical mechanics were the first to build analytic understanding of agent-based models, they have to pass the reins to theoretical computer scientists.

Note: this is the second blogpost of a series (first here) on the recent workshop on Natural Algorithms and the Sciences at Princeton’s Center for Computational Intractability.

Chazelle, B. (2012). Natural algorithms and influence systems Communications of the ACM, 55 (12) DOI: 10.1145/2380656.2380679

## Quasi-delusions and inequality aversion

Patient M: It’s impossible —- no one could urinate into that bottle -— at least no woman could. I’m furious with her [these are the patient's emphases] and I’m damned if I am going to do it unless she gives me another kind of bottle. It’s just impossible to use that little thing.

Analyst: It sounds as if a few minutes of communication with the nurse could clear up the realistic part of the difficulty—is there some need to be angry with the nurse and keep the feeling that she has done something to you?

Patient M: The ‘impossibility’ of using the bottle could be gotten over by using another—or I could use a funnel or a plastic cup and pour it into the bottle. But I just won’t. It makes me so mad. If she wants that sample, she is going to have to solve that problem. [Sheepishly] I know how irrational all this is. The nurse is really a very nice person. I could easily talk to her about this, and/or just bring in my own container. But I am really so furious about it that I put all my logic and knowledge aside and I feel stubborn—I just won’t do it. She [back to the emphasis] can’t make me use that bottle. She gave it to me and it’s up to her to solve the problem.

The above is an excerpt from a session between psychoanalyst Leonard Shengold (1988) and his patient. The focus is on the contrast between M’s awareness of her delusion, and yet her continued anger and frustration. Rationally and consciously she knows that there is no reason to be angry at the nurse, but yet some unconscious, emotional impulse pushes her to feel externalities that produce a behavior that she can recognize as irrational. This is a quasi-delusion.

Marcel presented a specific type of quasi-delusion in an earlier post on quasi-magical thinking. This is a mistake of inference, where an agent realizes that their action cannot change the (often simultaneous) action of another agent, and yet they act in a way that is not-consistent with this view. They act as if their action will alter the choice made by their partner. I formalized Marcel’s discussion in the context of learning in our objective versus subjective rationality framework. The goal of this post is to do the same with quasi(mis)representations of the objective game.

We consider agents interact with each other in an objective one-shot, cooperate-defect game given by the payoff matrix $\begin{pmatrix} 1 & X \\ Y & 0 \end{pmatrix}$ (with $Y \geq X$). Suppose that the agent is aware of the game, but exhibits inequality or fairness biases (Fehr & Schmidt, 1999). If the agent is sensitive to advantageous and disadvantageous inequality with strength $\alpha$ and $\beta$ then, they will act as if the game is $\begin{pmatrix} 1 & U \\ V & 0 \end{pmatrix}$, where;

$U = X - \alpha(Y - X)$ and $V = Y - \beta(Y - X)$

Which means if we know their behavior and the objective game, then we can infer the inequality biases $\alpha$ and $\beta$ as:

$\alpha = \frac{X - U}{Y - X}$ and $\beta = \frac{Y - V}{Y - X}$

The best part about these inequality biases is that they are directly observable in humans, and monkeys! My favorite example is Bronson & de Waal’s (2003) experiment with capuchin monkeys. These monkeys eat both cucumber slices and grapes as snacks, but strongly prefer grapes over cucumbers. When a monkey is given a cucumber as a reward for an easy task, he happily gobbles it up. However, after seeing his friend receive a better reward — a grape! — for the same task, the monkey becomes angry. The next time he performs the task and receives a cucumber, he is extremely unhappy and rejects the treat. It is best to watch it for yourself (excerpt from a TED talk with Frans de Waal narrating):

In humans, Tricomi et al. (2010) directly observed both advantageous and disadvantageous inequality aversion with fMRI. Although human inequality aversion might not be innate or universal, children lack it at age 3-4 but develop an ethnocentric version of inequality aversion by age 7-8 (Fehr et al., 2008). In these experimental studies, unfortunately, both the monkeys and humans were not playing a game against each other, but receiving fixed rewards from the experimentalist. To gain direct experimental insights into inequality aversion in games, we have to turn to the economics literature where the results are much more mixed. This quasi-delusion is observed by some (Anderson et al., 2008) and not other (Sadrieh et al., 2006; Hofmeyr, 2007) studies, depending on the specifics of the public-goods game in consideration. hopefully, careful computational modeling of this can help us understand this literature better.

### References

Anderson, L. R., Mellor, J. M., & Milyo, J. (2008). Inequality and public good provision: An experimental analysis. The Journal of Socio-Economics, 37(3), 1010-1028.

Brosnan, S. F. & de Waal, F. B. M. (2003). Monkeys reject unequal pay. Nature, 425: 297-299.

Fehr, E., & Schmidt, K. (1999). A Theory of Fairness, Competition, and Cooperation The Quarterly Journal of Economics, 114 (3), 817-868 DOI: 10.1162/003355399556151

Fehr, E., Bernhard, H., & Rockenbach, B. (2008). Egalitarianism in young children. Nature, 454(7208), 1079-1083.

Hofmeyr, A., Burns, J., & Visser, M. (2007). Income inequality, reciprocity and public good provision: an experimental analysis. South African Journal of Economics, 75(3), 508-520.

Sadrieh, A., & Verbon, H. A. (2006). Inequality, cooperation, and growth: An experimental study. European Economic Review, 50(5), 1197-1222.

Shengold, L. (1988). Quasi-delusions: a brief communication. International journal of Psychoanalysis, 69(4): 471-473. [full text]

Tricomi, E., Rangel, A., Camerer, C. F., & O’Doherty, J. P. (2010). Neural evidence for inequality-averse social preferences. Nature, 463(7284), 1089-1091.

## Algorithmic view of historicity and separation of scales in biology

A Science publications is one of the best ways to launch your career, especially if it is based on your undergraduate work, part of which you carried out with makeshift equipment in your dorm! That is the story of Thomas M.S. Chang, who in 1956 started experiments (partially carried out in his residence room in McGill’s Douglas Hall) that lead to the creation of the first artificial cell (Chang, 1964). This was — in the words of the 1989 New Scientists — an “elegantly simple and intellectually ambitious” idea that “has grown into a dynamic field of biomedical research and development.” A field that promises to connect biology and computer science by physically realizing John von Neumann’s dream of a self-replication machine.

Although not the first presentation of the day, Albert Libchaber‘s after-lunch talk shared progress on this quest with the participants of the 2nd Natural Algorithms and the Sciences workshop. Libchaber shared his group’s bottom up approach to artificial biology, working toward the goal of experimentally realizing a minimal self-replication cell. He showed how his group is able to create a self-organizing lipid bilayer through a process no more complicated than mixing your salad dressing. These simple artificial cells were able to persists for many hours, and could even be seeded with plasmid DNA (although the group is not yet equipped to look at the transfer of plasmids between membranes). But most stunning for me, was how they could induce replication through fission in these simple membranes.

For Libchaber, metabolism is a competition between two time scales: (1) the growth of cell surface area by attachment of new fatty acids, and (2) the growth of intracell volume by absorption of water. When the fatty acids and water are in perfect balance, the artificial cell can grow symmetrically as a sphere to up-to 4 times its size. However, if extra fatty acids are introduced to the outer solution, then the surface area grows too quickly for the absorption of water to catch up. This forces the membrane into an chaotic adaptive configuration of law elastic energy consisting of an array of vesicles connected by thin tubules. Occasionally, stochastic fluctuations cause these tubules to sever and the cell divides. This is similar to the reproductive process of L-form bacteria, although Libchaber was quick to stress that this didn’t make his systems alive, but merely an “artificial cell in an artificial world”. He was effectively the designer: controlling the environment to exogenously introduce a higher concentration of fatty acids in the outside solution, by contrast the L-form bacteria achieve this on their own by endogenously producing more fatty acids.

In the first talk of the day, Leslie Valiant used ecorithms to study the question of a ‘designer’ and adaptive phenomena more generally. For Valiant, biology is a specialization of computer science, and the emergence of complex circuitry without a designer is evolution in the former and a specific type of machine learning in the latter. By modeling evolution as a restriction of general PAC-learning, Valiant (2009) can overcome biology’s agnosticism on time to evolve from A to B. This allows us to strengthen Dobzhansky’s “nothing in biology makes sense except in the light of evolution” by putting restrictions on the evolvable. As we try to explain some physiological or cognitive processes in animals, we can restrict our set of hypotheses by considering only those that are evolvable.

Unfortunately, the connection to machine learning is not without its downsides. In a typical machine learning application, there is a target function that the algorithm is trying to approximate. In the case of evolvability, Valiant is forced to define an “ideal function” that for every possible circumstance specifies the best action in the given environment. Although this function is unknown, and can be encoded in the structure of the physical world, it still brings us dangerously close to a teleological view of evolution and restricts us to static fitness landscapes. Finally, the fact that the target function and thus environment do not depend on the learning algorithm means that the dynamics are fundamentally limited in their historicity, and cannot capture everything we are interested about in biology. Valiant’s model is only effective for short-term micro-evolution where a relatively static fitness landscape can be reasonably assumed.

After a short coffee break, Simon Levin started his talk by looking at evolution and questioning the static fitness landscape assumption. In general, the environment isn’t constant, as in the early Nk model he developed in Kauffman & Levin (1987). Instead of a landscape of rigid hills-and-valleys, we should think of it as a waterbed where agents’ progress through the landscape deforms and creates new and different peaks as the environment co-evolves with the agents. This forces us to contrast the frequency independent and game-theoretic approach to evolution.

Of course, questions of biological complexity extend beyond evolution and are a strong presence in development and cognition. Connecting biology back to computer science, Levin highlighted Turing’s foundational work on morphogenesis — explaining how global symmetry-breaking occurs in the development of a spherically-symmetrical embryo into a non-spherically symmetrical organism while obeying local symmetry-preserving rules like diffusion. Turing (1952) showed that with two types of morphogens — activators and inhibitors — with different diffusion rates, small stochastic fluctuations in their concentration could be amplified into large static (or dynamic) global patterns such as zebra stripes or leopard spots. Since diffusion rates are a way to set timescales, this separation of time-scales leading to symmetry breaking foreshadowed Libchaber’s definition of metabolism and example of artificial cell-division later in the day.

Levin closed with cognition by focusing on bounded rationality in three settings: ultimatum game, hyperbolic discounting, and collective behavior. Bounded rationality contrasts the standard Homo Economicus by explicitly accounting for uncertainty, cost of gathering information, and computational complexity restrictions on cognitive processing. In the ultimatum game, the agent has to balance a culture dependent fairness norm against rational game-theoretic reasoning. These norms are heuristics (in the Kahnemann sense), and their shape is guided by a general theory of meta-games. In hyperbolic discounting, the increasing discount rate leads to intertemporal inconsistencies in the agent, leading to phenomena like self-enforcing behavior to overcome this. Levin contrasted the proximate and ultimate explanations for discounting. With a proximate cause being something as simple as an agent averaging the influence of many brain regions that follow scale-free geometric discounting, and the ultimate case being evolutionary: a selective pressure to account for uncertainty and the the cost of foreseeing the future. This proximal cause of averaging is not a trivial phenomena, whether you are looking at the collective behavior of brain regions, birds, or caribou the result can be a very rich dynamic.

In the sixth talk of the day, Naomi Enrich Leonard picked up where Levin left off and focused on the evolution of leadership in collective migration. Collective migration is a social learning dilemma where agents need to decide if they will innovate through costly measurements of the true direction they need to travel, or simply imitate the behavior of others. Pais & Leonard (2013) focused on studying under what conditions the convert stable strategy for their population was an evolutionary stable strategy of homogeneous level of investment in innovation by all agents, and when it bifurcated into a heterogeneous population of leaders and followers.They overcame the curse of computing by pushing their analytic treatment as far as possible before resorting to smart simulations. In particular, they separated the time-scales of migration (and thus, determination of fitness) and evolutionary change in levels of investment in leadership. They solved the migratory part of their model analytically, arriving at a function capturing the mean fitness and variance in fitness of leaders and followers in an unstructured herd. They could then use this fitness calculations to optimize their evolutionary simulation, and continued to investigate simpler cases (such as a herd of two) analytically. In their results, they were able to draw a sharp boundaries in the cognitive cost of leadership between homogeneous investment, unstable bifurcations, stable bifurcation, and total abandon of innovation. Further, if a population was pushed to abandon innovation then they observed a historicity effect: the cost of leadership would need to be lowered further than the stable bifurcation value before investment in innovation could be restored.

In his opening remarks, Bernard Chazelle, presented the general framework of influence systems (Chazelle, 2012) and defined natural algorithms as “a new language for algorithmic dynamics”. The collective migration models studied by Leonard and Levin are a special case of these influence systems. However, Chazelle (much like Valiant’s work with evolution) did not build his framework to answer specific questions like bifurcations between leaders and followers, but instead to build analytic tools for studying agent-based models and a explore the separation of time-scales. He used diffusion to illustrate his point:

1. Micro-level: a diffusion system is a collection of individual agents (water molecules and the diffusing substance like a piece of dust) with each agent following simple local deterministic rules (Newton’s laws)
2. Meso-level: diffusion can be modeled as the stochastic process of Brownian motion where most molecules are treated as a random bath, and a focal agent (say a piece of dust on the water’s surface) is tracked explicitly. Note that the stochasticity arises from our choice of abstraction.
3. Macro-level: diffusion becomes a deterministic partial differential equation, that describes the density of the diffusing agents at each point and time.

In physics, this separation of scales is handled through renormalization. In agent-based models, the lack of symmetry makes the process much more involved.
Like Valiant, Chazelle draws a parallel to computer science by viewing the algorithmic renormalization needed to identify the relevant separations of scale in agent-based models as a compiler for dynamic systems. The renormalization used in physics, by comparison, is compiling a simple program by hand. For Chazelle, “biology = physics + history”, and history is the greatest symmetry breaker. However, we can’t remain reliant on mere simulations for exploring these history-dependent systems, and need to develop a full algorithmic calculus to grow our understanding.

Note: this is the first blogpost of a series summarizing the 2013 workshop on Natural Algorithms and the Sciences at Princeton’s Center for Computational Intractability. My order of presentation does not follow the speaker schedule since I am highlighting the talks based on my own synthesis. But don’t worry, I will mention all talks! The other four presentations from May 20th will be in the next post.

### References

Chang, T. M. (1964). Semipermeable microcapsules. Science, 146(3643): 524-525.

Chazelle, B. (2012). Natural algorithms and influence systems. Communications of the ACM, 55(12).

Kauffman, S., & Levin, S. (1987). Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology, 128(1): 11-45.

Pais, D., & Leonard, N. (2013). Adaptive network dynamics and evolution of leadership in collective migration Physica D: Nonlinear Phenomena DOI: 10.1016/j.physd.2013.04.014

Turing, A. M. (1952). The Chemical Basis of Morphogenesis. Philosophical Transactions of the Royal Society of London 237 (641): 37–72.

Valiant, L.G. (2009) Evolvability. Journal of the ACM 56(1): 3

## Natural algorithms and the sciences

Today, I am passing through New York City on my way to Princeton’s Center for Computational Intractability for a workshop on Natural Algorithms and the Sciences (NA&S). The two day meeting will cover everything from molecular algorithms for learning and experiments on artificial cells to bounded rationality in decision-making and the effects of network topology on the evolution of collective migration. In other words, right at home for TheEGG blog. The full mission statement:

The workshop will bring together researchers from computer science, mathematics, physics, biology, and engineering to explore interactions among algorithms, dynamical systems, statistical physics, and complexity theory (in all senses of the term).

“Computer science” is a cruel historic misnomer for a field of study that doesn’t focus on just science, but combines three different approaches: engineering, science, and mathematics. Natural algorithms (or “Natural computing“, more generally) are similarly diverse, with three general themes roughly following the engineering-science-math distinctions:

1. Taking inspiration from natural processes to design metaheuristics for solving difficult engineering optimization problems. In my experience, this is the most common approach and contains ideas like: genetic algorithms, neural networks, and swarm optimization.
2. Implementing physical computations in non-electronic media. This includes ideas like molecular or DNA computing, and the experimental part of quantum computing (or combinations of both!)
3. Using ideas of information processing and computation to reason about natural and social phenomena or the theories that describe them. To me, this seems like the least well developed and most exciting view. It corresponds to the algorithmic lens and includes fields like theory of quantum computing, and algorithmic game-theory/economics.

The history of this field can be traced back to Alan Turing’s work on B-type neural networks, artificial intelligence, morphogenesis, and artificial life. However, the real foundation and popularization probably came with Holland’s (same Holland that ran some of the first simulations of ethnocentrism) introduction of genetic algorithms[1]. At first, these metaheuristics were primarily employed by computer scientists and engineers to solve optimization problems, with occasional (but often inconsequential) wanderings into application to the sciences. It was the popularizing work of researchers like Robert Axelrod (1984), that brought genetic algorithms to masses of social scientists. The first direct mention of “Natural algorithms” in the sense used here that I could find is Dorigo’s 1992 thesis “Optimization, learning and natural algorithms”. Now, simulations using variants to the metaheuristics developed by the first stream, are a common sight everywhere from neuroscience to economics to political science; a strange realization of the third steam of natural computing[2].

Unfortunately, the field largely remained stuck at simulations, and forgot its roots in mathematics and analytic models. This is the curse of computing : it is simpler to run quick simulations than to develop the analytic tools to understand the underlying phenomena. The senior co-organizer of the NA&S workshop, Bernard Chazelle, is starting to address this issue by introducing a theory of natural algorithms and an algorithmic calculus (Chaezelle, 2012). He considers two types of processes, and illustrated them with the Brownian motion diffusion of pollen particles suspended in water:

1. Eulerian approach: a differential equation for the concentration of particles at point and time. The strength of this approach is that is allows us to rely on a well developed theory of calculus. However, we must commit to infinitesimal changes and that every point abides by identical laws.In evolutionary game theory, replicator dynamics are the prototypical examples of this approach, and the Ohtsuki-Nowak transform is an example of a slightly more involved application.
2. Lagrangian approach: track the movement of individual pollen particles and water molecules in accordance to Newton’s laws. If the simulation becomes intractable then simplify by mean-field or ergodic approximations; for instance by modeling a single pollen particle with random nudges from the ambient bath. In evolutionary game theory, this approach corresponds to agent-based modeling, and suffers from the lack of powerful theory like calculus to guide our understanding.

The lack of theory for understanding agent-based models means that we must usually rely on running individual simulations for each parameter setting we are interested in. This makes it difficult to understand how the assumptions of our model effect the large-scale behavior of our results, and how relaxations or modifications of those assumptions will change future models. Chazelle (2012) addresses this lack of theory by introducing a general modeling framework of influence systems and developing an algorithmic calculus to better understand them. Other participants at the NA&S workshop will present similar developments in their field, and I will report more in future posts!

### Notes and References

1. There is a parallel development in artificial intelligence and cognitive science with the use of production-systems and neural networks. But that discussion is more involved, so I might discuss it in a separate blog post.
2. I call this realization “strange” because many of the approaches don’t start with thinking about how the underlying science functions, but instead borrow the engineers’ tools whole-sale and assume that the very rough analogy the original computer scientists that developed the algorithms is an accurate enough reflection of reality to draw scientific conclusions. In evolution and ecology, this is most painfully obvious when researchers adopt the simple “fittest fraction of agents survive” as their selection rule,”random pairing and recombination with point-wise mutations” as their variance rule, and their paper’s conclusion as their fitness function. In cognitive science, this happens when psychologists train artificial neural networks on simplifications of human tasks and then try to draw conclusions about neural structures of real brains, even when their models are built in only vague (and not scientifically reasonable) analogy to real neural dynamics.

Axelrod, R. (1984). The Evolution of Cooperation. Basic Books.

Chazelle, B. (2012). Natural algorithms and influence systems Communications of the ACM, 55 (12) DOI: 10.1145/2380656.2380679

Dorigo, M. (1992). Optimization, learning and natural algorithms. Ph. D. Thesis, Politecnico di Milano, Italy.

Holland John, H. (1975). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. University of Michigan Press.

## Ethnocentrism, religion, and austerity: a science poster for the humanities

Artem Kaznatcheev and I presented a poster on May 4th at the University of British Columbia to a highly interdisciplinary conference on religion. The conference acronym is CERC, which translates as Cultural Evolution of Religion Research Consortium. Most of the 60-some attendees are religion scholars and social scientists from North American and European universities. Many are also participants in a large partnership grant from the Social Sciences and Humanities Research Council of Canada (SSHRC), spearheaded by Ted Slingerland, an East Asian scholar at UBC. Some preliminary conversations with attendees indicated considerable apprehension about how researchers from the humanities and sciences would get on. Many of us are familiar with collaborative difficulties even in our own narrow domains. Skepticism was fairly common.

As far as I know, our poster was the only computer simulation presented at the meeting. We titled it Agent-based modeling of the evolution of “religion”, with scare quotes around religion because of the superficial and off-hand way we treated it. Because we know from experience that simulations can be a tough sell even at a scientific psychology conference, we were curious about whether and how this poster would fly in this broader meeting.

We opted for a poster composed almost entirely of graphics. This makes for convenient talking points and does not demand much reading, which is in any case anathema to social interaction and difficult to do in the noisy environment of a poster session.

The major theme of this SSHRC grant is that humans might have evolved religion because religion enforces cooperation. Although I was late joiner of the SSHRC grant, our poster addressed two of the five major sub-questions posed in the grant application, from the standpoint of simulations of biological evolution: whether cooperation is inherently parochial, and whether cooperation is enhanced by existential insecurity. The left column of the A1-sized poster dealt with the parochial question, while the right column dealt with the insecurity question.

### Cooperation is parochial

The table on the left shows the net outcomes for player 1 in a well-known non-zero-sum game which highlights the essential dilemma of cooperation. Each of two players can either cooperate with the other or not. The payoff for the row player is computed as the benefit of receiving cooperation minus the cost of giving it. If player 1 cooperates with a cooperator, player 1 gets the benefit of receiving cooperation and loses the cost of giving it. If player 1 cooperates against a defector, player 1 incurs the cost of giving cooperation but enjoys no benefit (the so-called sucker’s payoff). If player 1 defects against a cooperator, player 1 enjoys the benefit of cooperation without incurring any cost (the so-called temptation to defect). When defection is mutual, the outcome is 0, as no cooperation occurs.

Without knowing what the other player will do, mutual defection is the rational solution (Nash equilibrium). However, as long as benefit exceeds cost, which it often does in specialized populations, the best overall outcome for both parties is mutual cooperation (Pareto optimum). Because of the competitive nature of natural selection, cooperation between biological agents has been a mystery since Darwin, and yet cooperation is extremely common across biology. One explanation is that species have somehow evolved to cooperate.

In every evolutionary cycle of our computer simulations, each agent interacts with each of its immediate
von Neumann neighbors by playing a one-shot prisoner’s dilemma game, with the game’s outcomes added to the agent’s potential to reproduce. An agent can reproduce if a randomly-drawn proportion is less than this modified potential, and if there is an empty adjacent space for the offspring.

The world is a 50-by-50 toroidal (to ensure that each agent has the same number of potential neighbors) lattice, as illustrated in this sample figure after 50 evolutionary cycles. Each agent has three simple genes. Two genes specify cooperation or defection against in-group or out-group neighbors, and the third specifies a group tag so that an agent can identify whether their neighbor is from the same group. The world starts empty, and each evolutionary cycle a new, randomly-constructed agent immigrates into a randomly-chosen cell. Cloned offspring are just like their parent, except for a tiny mutation rate.

At 50 cycles, the world is still relatively empty, but beginning correlations between strategy, tag, and location can already be detected. Group tag is indicated by color, and strategy by letter. Selfish agents don’t cooperate with anyone, traitors cooperate outside but not inside their group, ethnocentrics cooperate inside but not outside their group, and humanitarians cooperate with everyone. These sample plots are for a smaller 25×25 lattice so that the world’s state is easier to view.

After 500 evolutionary cycles, this same world is much fuller, and exhibits ethnocentric dominance as seen on the left. Below right, a plot of strategy means and standard errors shows results for 50 worlds across 1000 cycles. When the population saturates, around 200 cycles, ethnocentrics start to pull away from their close competitor, humanitarians. The reason for this separation is that humanitarians cooperate across group frontiers, incurring the sucker’s payoff, while ethnocentrics do not, a benefit of succumbing to the temptation to defect. Ethnocentric dominance is robust across wide variation in parameter settings, as long as the temptation to defect exceeds the cost of cooperation. Selfish and traitorous strategies never get much traction because they don’t take care of their own kind. Due to mutations, these free-rider strategies do not disappear (Hartshorn, Kaznatcheev, & Shultz, in press).

It is also interesting to note that ethnocentric cooperation is common across a wide range of species, from bacteria and plants to people (witness voting patterns and ethnic cleansing). So, yes, cooperation does tend to be parochial, at least with viscous populations and group tags, and when group detection is not too costly (Kaznatcheev, 2010a, 2010b; Kaznatcheev & Shultz, 2011). Evolution favors ethnocentrism because it keeps cooperation high (here around 75% of interactions), sometimes even higher than humanitarianism does.

### Austerity promotes cooperation.

We implemented existential insecurity by varying the austerity of the environment. In simulations, an austere environment can be created by either increasing death rate or decreasing birthrate. We varied birthrate, while holding death rate constant. This preliminary experiment was done similarly to the parochial simulation except for the absence of group tags, to keep things simple. The only two strategies here are cooperation or defection.

This figure on left shows standard errors of mean proportions of cooperation. There are more cooperative behaviors in austere than in easy worlds.

Below left, standard error values around the mean number of strategies in an austere environment reveal more cooperative strategies than defection strategies. Below right, the opposite is true for an easy environment.

So, in this very abstract study, austerity favors the evolution of cooperation. We do still need to explore the parametric space of this simulation and run it with group tags.

How well was all this received by our new multi-disciplinary colleagues? They seemed to get it, even if
they were unfamiliar with the methodology, they readily recognized the implications for religion, and they found it very interesting. Some even liked our deliberately evocative names for cooperation strategies. Most of the time, I inadvertently (or sometimes intentionally) neglected to mention that different religions could be represented by group tags. To my initial surprise, our poster goers did not need to hear me say that – they automatically inferred it. None objected to our superficial characterization of religion. Perhaps this connection is already evident from the many religion-fueled conflicts around the world, currently and throughout human history.

Of course, our simulations are considerably more general than religious tags. Unlike other species with relatively fixed group tags, humans have proven to be quite flexible in the tags they use to discriminate in- from out-group members. Humans use skin color, language, language accent (Cohen, 2012), location, and many other initially arbitrary tags to determine what sort of agent they are dealing with. In many urban centers around the world, it is important not to wear the wrong-color shirt in a particular neighborhood. Such tags are cultural products, but the important point is that humans are genetically prepared to use group tags in support of their ethnocentric patterns of cooperation.

Many excellent questions were asked about the poster, some of which could be answered with reference to
other simulations from our lab and this blog. To cite a single example, Professor Lyle Eslinger, a particularly astute religious-studies colleague from U. of Calgary noted that our austerity simulation reminded him of the pioneering (1902) observations of Prince Kropotkin in the austere regions of Siberia. Cooperation was more evident to Kropotkin than the competition noted by Darwin in the easier Galápagos Islands. Not many computational modelers would have spontaneously come up with that reference.

Post-conference, I received a batch of papers from Ian MacDonald, a student of David Sloan Wilson at U. of
Binghamton, who gave a conference talk on their study of church-goers in that city. The gist of these correlational studies is that religious activity is positively related to austere environmental conditions (Delamontagne, 2010; Gill & Lundsgaarde, 2004; Immerzeel & van Tubergen, 2013; Ruiter & van Tubergen, 2009). In a complementary fashion, our abstract simulation shows that increased austerity causes an increase in cooperation and cooperative strategies; perhaps additional simulations will provide further insights and qualifications.

We are pleased at this reception and at having an excellent opportunity to meet and interact with our new
colleagues.

It turns out that we humans and these abstract computer agents are not alone in exhibiting a positive relation between environmental austerity and cooperation. This figure shows that yeast populations with pure cooperators are better able to survive a sudden and harsh environmental change than are mixed populations of cooperators and defectors. All six pure-cooperator populations and only one of six mixed populations were able to withstand a sudden drop in the food supply on day 3 (Sanchez & Gore, 2013).

Notwithstanding all this harmony, I do anticipate some downstream controversy around the notion of group selection as an evolutionary mechanism. It is relevant to note that the effects in our simulations are products of simulated individual selection. Our groups are emergent properties of natural selection acting on the reproductive potentials of individual agents. Notions of group selection are not necessary to explain our simulation results.

### References

Cohen, E. (2012). The evolution of tag-based cooperation in humans: the case for accent. Current Anthropology, 53, 588-616.

Delamontagne, R. G. (2010). High religiosity and societal dysfunction in the United States during the first decade of the twenty-first century. Evolutionary Psychology, 8(4), 617-657.

Gill, A., & Lundsgaarde, E. (2004). State welfare spending and religiosity: a cross-national analysis. and Society, 16(4), 399-436.

Hartshorn, M., Kaznatcheev, A., & Shultz, T. R. (in press). The evolutionary dominance of ethnocentric cooperation. Journal of Artificial Societies and Social Simulation.

Immerzeel, T., & van Tubergen, F. (2013). Religion as reassurance? Testing the insecurity theory in 26 European countries. European Sociological Review, 29(2), 359-372.

Kaznatcheev, A. (2010a). The cognitive cost of ethnocentrism. In S. Ohlsson & R. Catrambone (Eds.),Proceedings of the 32nd Annual Conference of the Cognitive Science Society (pp. 967-971). Austin, TX: Cognitive Science Society. [pdf]

Kaznatcheev, A. (2010b). Robustness of ethnocentrism to changes in inter-personal interactions. In Complex Adaptive Systemes – AAAI Fall Symposium. Arlington, VA. [pdf]

Kaznatcheev, Artem, & Shultz, Thomas R. (2011). Ethnocentrism maintains cooperation, but keeping one’s children close fuels it. Proceedings of the 33rd Annual Conference of the Cognitive Science Society, 3174-3179 [pdf]

Ruiter, S., & van Tubergen, F. (2009). Religious attendance in cross-national perspective: a multilevel analysis of 60 countries. American Journal of Sociology,
115
(3), 863-895.

Sanchez, A., & Gore, J. (2013). Feedback between population and evolutionary dynamics determines the fate of social microbial populations. PLoS Biology, 11(4), e1001547.

## Did group selection play a role in the evolution of plasmid endosymbiosis?

Bacterial plasmids are nucleotide sequences floating in the cytoplasm of bacteria. These molecules replicate independently from the main chromosomal DNA and are not essential to the survival or replication of their host. Plasmids are thought to be part of the bacterial domain’s mobilome (for overview, see Siefert, 2009), a sort of genetic commonwealth which most, if not all, bacterial cells can pull from, incorporate and express. Plasmids can replicate inside a host and then move to another cell via horizontal genetic transfer (HGT), a term denoting various mechanism of incorporation of exogenous genetic material. Of these mechanisms, two are relevant to plasmids:

1. transformation – the uptake of genetic material in the cell’s surrounding environment through its membrane, and
2. conjugation – the direct transfer of genetic material via cell-to-cell contact.

Interestingly, in the case of conjugation, most plasmid-carrying cells have mechanisms to detect whether a potential recipient cell already carries the plasmid to be transferred, and initiate transfer only if it does not. Horizontal genetic transfer plays a role similar for bacteria as sexual reproduction does for sexually differentiated forms of life by increasing genetic variability and thus evolutive potential. Most plasmids are endosymbionts to their host cell and may serve, among other functions, to foster antibiotic resistance as well as cause benign bacterial strains to become virulent. Other plasmids, however, parasitize their host and degrade cell function and fitness.

### Endosymbionts: evolution of cooperation

Endosymbiotic plasmids provide an interesting case study for inquiring into the evolution of cooperation. For instance, in his 2002 paper that I mentioned earlier, Paulsson claims that replication control by plasmids is the product of group selection – if plasmids are seen as the individuals and the host cell as the group. If plasmids replicated up to the cell’s carrying capacity, they would increase their chance of fixing in the descendant cells relative to their cell mates but would significantly hamper their host. Viewed from a group selection standpoint, within-group selection should then favor “selfishness” –- high replication rate –- while between-group selection should favor limited replication — “altruism”. By controlling their replication below carrying capacity via a sophisticated feedback system involving plasmid-encoded replication activators and inhibitors, many plasmids have struck a balance between these opposite selection gradients. (Paulsson, 2002)

A few weeks ago I gave a presentation on Paulsson’s paper about plasmid group selection for the EGT group, at the end of which Artem suggested that we build a game theoretical model to better understand, and generalize, the mechanisms at play behind plasmid replication control. To be honest, I am only beginning to dabble with game theory, so I am unsure exactly where our work is leading to. But it is nevertheless clear to me that before building any such model, we should understand the basics of how plasmids replicate within a cell and how they split at cell division, for these two mechanisms — replication and partition — greatly influence plasmid fitness.

### Replication control

To be maintained across generations of bacterial cells, plasmids must ensure that they replicate at least once during the life cycle of their host. As a result, most plasmids have evolved systems to enable and control their replication. Some plasmids replicate only once at each cell cycle, as is the case of the prototypical plasmid, the F-plasmid; others replicate many times per cycle. Many plasmids control their replication via a feedback system in which activators promote replication and inhibitors contain it. In the case of two well-studied plasmids, R1 and ColE1, activators can act in cis – in which case the cis sequence is usually neighboring the replication initiation sequence and affects only that sequence – or in in trans – in which case the trans sequence affects the initiation sequences of all activator-sensitive plasmids in the cell (note that plasmids of different types are not necessarily insensitive to other plasmids’ activators/inhibitors; more about that in the penultimate paragraph). Inhibitors, by contrast, act only in trans. The distinction between cis and trans is an important one, because any mutation to a trans activator will be “public” to all activator-sensitive plasmids in the cell, whereas cis activators/inhibitors mutations will be private to the mutant. (Paulsson 2002)

### Partition control

Not only must plasmids secure their replication, they must also ensure that, once replicated, the plasmids will be split in the daughter cells in such a way that each daughter contains at least one plasmid copy. For instance, in the case of the F-plasmid, since the latter replicates only once, the partition control mechanism has to ensure, and does, that each daughter contains exactly one copy. Unlike the partitioning of chromosomal DNA during cell mitosis, which was first observed at the end of the nineteenth century and is now well understood, the precise mechanisms of plasmid partitioning largely remain a mystery. Some hints have recently been uncovered, however. A few studies have shown that some plasmids use a mechanical system similar to that which partitions chromosomes during mitosis. Where that is case, plasmids are tethered to each pole of the dividing cell with protein “strings” or tubular structures that pull sister plasmids apart as the host divides. But it has recently been suggested that the dominant mechanism for plasmid partitioning might be molecular rather than mechanical in nature. For low-copy number plasmids, especially, some authors have shown that plasmids encode Par ATPases, <proteins built around the Walker A amino-acid sequence motif that ensure effective partition (Sherratt 2013). I won’t delve into this further here, mostly because I know so little about the subject, and also because I intend to deepen this discussion in future posts.

### Incompatibility

Many plasmids are incompatible with each other, meaning that when both are present in a given host, one of the two will fixate at the expense of the other. It is now known that such incompatibility is mainly due to mutual susceptibility to each plasmid’s replication and partition control system. Let us take the simple example of two plasmids, each of which has a characteristic copy number $m$ of 1. If the two plasmids share the same replication control mechanism, the latter will allow one of the two plasmids to replicate, but will then prevent any further replication. The losing plasmid will not be replicated, and will thus be found in only one of the daughter cells. (Novick, 1987)

This post was meant to sketch the very basics of the dynamics affecting plasmid replication and partition. Plasmids have sophisticated mechanisms for controlling replication and partition and ensuring their viability down bacterial cell lines. Although the details of these control systems are often still shrouded in mystery, the little we know of them should enable us to sketch a simple yet plausible model of plasmid cooperation.

### References

Novick, R.P. (1987) Plasmid incompatibility. Microbiological Reviews, 51-4: 381-395

Paulsson J (2002). Multileveled selection on plasmid replication. Genetics, 161 (4), 1373-84 PMID: 12238464

Siefert, J. L. (2009). Defining the mobilome. In Horizontal Gene Transfer (pp. 13-27). Humana Press.

Sherratt, D. (2013) Plasmid partition: sisters drifing apart. The EMBO Journal, 32:1208-1210

## Evolution explains the fundamental constants of physics

While speaking at TEDxMcGill 2009, Jan Florjanczyk — friend, quantum information researcher, and former schoolmate of mine — provided one of the clearest characterization of theoretical physics that I’ve had the please of hearing:

Theoretical physics is about tweaking the knobs and dials and assumptions of the laws that govern the universe and then interpolating those laws back to examine how they affect our daily lives, or how they affect the universe that we observe, or even if they are consistent with each other.

I believe that this definition extends beyond physics to all theorists. We are passionate about playing with the the stories that define the unobservable characters of our theoretical narratives and watching how our mental creations get along with each other and affect our observable world. With such a general definition of a theorists, it is not surprising that we often see such thinkers cross over disciplinary lines. The most willing to wander outside their field are theoretical physicists; sometimes they have been extremely influential interdisciplinaries and at other times they suffered from bad cases of interdisciplinitis.

The physicists’ excursions have been so frequent that it almost seems like a hierarchy of ideas developed — with physics and mathematics “on top”. Since I tend to think of myself as a mathematician (or theoretical computer scientist, but nobody puts us in comics), this view often tempts me but deep down I realize that the flow of ideas is always bi-directional and no serious field can be dominant over another. To help slow my descent into elitism, it is always important to have this realization reinforced. Thus, I was extremely excited when Jeremy Fox of Dynamic Ecology drew my attention to a recent paper by theoretical zoologist Andy Gardner (in collaboration with physicists J.P. Conlon) on how to use the Price equation of natural selection to model the evolution and adaptation of the entire universe.

Since you will need to know a little bit about the physics of black holes to proceed, I recommend watching Jan’s aforementioned talk. Pay special attention to the three types of black holes he defines, especially the Hubble sphere:

As you probably noticed, our universe isn’t boiling, the knobs and dials of the 30 or so parameters of the Standard Model of particle physics are exquisitely well-tuned. These values seem arbitrary, and even small modifications would produce a universe incapable of producing or sustaining the complexity we observe around us. Physicists’ default explanation of this serendipity is the weak anthropic principle: only way we would be around to observe the universe and ask “why are the parameters so well tuned?” is if that universe was tuned to allow life. However, this argument is fundamentally unsettling, it lacks any mechanism.

Smolin (1992) addressed this discomfort by suggesting that the fundamental constants of nature were fine-tuned by the process of cosmological natural selection. The idea extends our view of the possible to a multiverse (not to be confused with Deutsch’s idea) that is inhabited by individual universes that differ in their fundamental constants and give birth to offspring universes via the formation of blackholes. Universes that are better tuned to produce black holes sire more offspring (i.e. have a higher fitness) and thus are more likely in the multiverse.

Although, Smolin (2004) worked to formalize this evolutionary process, he could not achieve the ecological validity of Gardner & Conlon (2013). Since I suspect the authors’ paper is a bit tongue-in-cheek, I won’t go into the details of their mathematical model and instead provide some broad strokes. They consider deterministically developing (also stochastic in the appendix) universes, and a 1-to-1 mapping between black-holes in one generation of universes and the universes of the next generation. Since — as Jan stressed — we can never go inside black-holes to measure their parameters, the authors allow for any degree of heritability between parent and offspring universes. At the same time, they consider a control optimization problem, with the objective function to maximize the number of black-holes. They then compare the Price dynamics of their evolutionary model to the optimal solution of the optimization problem and show a close correspondence. This correspondence implies that successive generations of universes will seem increasingly designed for the purpose of forming black holes (without the need for a designer, of course).

You might object; “I’m not a black hole, why is this relevant?” Well, it turns out that universes that are designed for producing black holes, are also ones that are capable of sustaining the complexity needed for intelligent observers to emerge (Smolin, 2004). So, although you are not a black-hole, the reason you can get excited about studying them is because you are an accidental side-effect of their evolution.

### References

Gardner, A., & Conlon, J. (2013). Cosmological natural selection and the purpose of the universe Complexity DOI: 10.1002/cplx.21446

Smolin, L. (1992). Did the universe evolve?. Classical and Quantum Gravity, 9(1), 173.

Smolin, L. (2004). Cosmological natural selection as the explanation for the complexity of the universe. Physica A: Statistical Mechanics and its Applications, 340(4), 705-713.

Tegmark, M., Aguirre, A., Rees, M. J., & Wilczek, F. (2006). Dimensionless constants, cosmology, and other dark matters. Physical Review D, 73(2), 023505.

## Four color problem, odd Goldbach conjecture, and the curse of computing

For over twenty-three hundred years, at least since the publication of Euclid’s Elements, the conjecture and proof of new theorems has been the sine qua non of mathematics. The method of proof is at “the heart of mathematics, the royal road to creating analytical tools and catalyzing growth” (Rav, 1999; pg 6). Proofs are not mere justifications for theorems; they are the foundations and vessels of mathematical knowledge. Contrary to popular conception, proofs are used for more than supporting new results. Proofs are the results, they are the carriers of mathematical methods, technical tricks, and cross-disciplinary connections.

Of course, at its most basic level, a proof convinces us of the validity of a given theorem. The dramatic decisiveness of proofs with respect to theorems is one of the key characteristics that set mathematics apart from other disciplines. A mathematical proof is unique in its ability to reveal invalid conclusions as faulty even to the author of that conclusion. Contrast this with Max Planck’s conception of progress in science:

A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.

Further, unlike in science, a mathematical conclusion is shown to be faulty by the proofs and derivations of other mathematicians, not by external observations. In fact, if we use Karl Popper’s falsification as the criterion for science then mathematics (as practiced) is clearly distinct. A typical statement, such as the recently proven odd (or, weak) Goldbach conjecture states:

Weak Goldbach conjecture

Every odd integer $n > 5$ is the sum of three primes.

Given any specific integer $n$, we can verify that it is the sum of three primes by looking at the finite list of primes less than $n$ and seeing if any three of them (with repetitions allowed) add up to $n$. In other words, we could set a computer program running to check odd numbers one at a time, and if we ever came to one that can’t be written as the sum of three primes then we would have falsified (in the proper Popper sense) our conjecture.

“What makes this unscientific?”, you ask, “the weak Goldbach conjecture is precise and falsifiable, Popper would call it science.” The distinction is in how math is practiced, if the weak Goldbach conjecture only had to pass scientific standards of truth then it would have been an accepted result 271 years ago when Christian Goldbach first doodled it in the margins of his letter to Leonhard Euler. We wouldn’t need top mathematicians like H.A. Helfgott proving it.

What the mathematical community needs is a clear argument that shows that the conjecture holds for all odd integers, not just the ones we bothered to test. A poor programmer might jump at this opportunity by (1) writing down a program that looks at the odd integers one at a time and halts only if finds a counterexample. He could then (2) write a second program that takes the code of any program and decides if it will halt or not; maybe if the task proves too difficult, he could offer $1000 for it on GetACoder.com. Finally, (3) if the second program reports that the first program halts then there must be a counterexample and the conjecture is false, and if the second program reports that the first program doesn’t halt then the odd Goldbach conjecture is true. Of course, you realized that the second step is the Halting problem that Alan Turing famously proved to be undecidable. For computer scientists, in fact, the saddest news in the resolution of the odd Goldbach conjecture is that they can no longer use it as an example when introducing the Halting problem in COMP-101. However, as theorists, we can suspend our disbelief and pretend that maybe we could solve the Halting problem, what would that mean for mathematics? If we simply had a mechanical oracle like Rav’s PYTHIAGORA that can read any conjecture and definitely declare if it is valid or not and “if all our toil and sweat in proving theorems served only to ascertain whether they were true or not, then PYTHIAGORA would deliver us of useless labors and frustrations” (Rav, 1999; pg. 6). Such a road would not lead us towards a mathematical paradise and Rav shows that if PYTHIAGORA existed then it “would have dealt a death blow to mathematics, for we would cease having ideas and candidates for conjectures” (Rav, 1999; pg. 6). A proof does more than verify a conjecture, it (often) establishes connections between fields, or introduces new methods and questions that are useful beyond the proven theorem. Most importantly, a proof is not a blackbox, it is a narrative to be consumed by the mathematician. What the mathematical community wants is the argument for the odd Goldbach conjecture to be beautiful and insightful, to build understanding. Although the universal PYTHIAGORA of our imagination is impossible, computer assisted proofs run the risk of moving us closer to this dystopia. Any discussion of computers in mathematics has to open with the four color problem; a problem that began its life in 1852 as a conjecture in graph theory: given any separation of a plane into contiguous regions, are four colors enough to color the regions of the map so that no two regions that share a common boundary (that is not a corner, where corners are the points shared by three or more regions) have the same color? In 1977, Kenneth Appel, Wolfgang Haken, and John Koch published a computer-assisted proof and birthed the new four color problem, a philosophical question about the nature of proof and the role of computers in mathematics. Thomas Tymoczko addressed the new problem in his 1979 paper: “The four-color problem and its philosophical significance”. Tymoczko (1979) characterizes proofs based on the anthropology, epistemology and logic of mathematics. Proofs are (a) convincing, (b) surveyable, and (c) formalizable. Surveyability plays a key role in Tymoczko’s analysis and he defines a proof as surveyable if it can be looked over and verified by a rational agent under realistic constraints. Without surveyability, a proof is not a priori. In the four-color theorem (4CT), a key lemma was obtained by a computer. Appel and Haken provided an inductive proof for the 4CT, which relies on considering the reducibility of 1834 configurations. Note that unlike the poor programmer’s approach to the odd Goldbach conjecture, real math is involved in first reducing the infinite number of possible partitions of the plane into a finite number of restricting configurations. Establishing the reducibility of each configuration is tedious and long, but algorithmic. Koch provided the algorithm for verifying the reducibility of the configurations and the verification by computer serves as the lemma which completes the proof by induction. Yesterday’s proof of the odd Goldbach conjecture by Harald Andrés Helfgott (2012,2013) follows a similar pattern of reducing an infinite statement to a proof with a large and finte number of casses to verify by computer. Helfgott applies the circle method that was developed by Hardy and Ramanujan in 1916. The method usually produces statements of the form ‘for sufficiently large $n$ (i.e. $n > n_0$) such-and-such property holds’. Helfgott refined the method to have ‘such-and-such property’ mean ‘is a sum of 3 primes’, and tightened $n_0$ to be about $10^{30}$. This application of circle method was the math of the proof, and a computer search was used to manually establishing a lemma for $n < 10^{30}$. The brute-force proof of this computer lemma is so long that no mathematician has, or even can, survey the proof; thus, Tymoczko (1979) argues, mathematicians must trust the accuracy of their computers – an experiment! If mathematicians are to accept the 4CT they must acknowledge the lemma as a computer experiment, much like ones in natural sciences. For Tymoczko (1979), this distinction for computer assisted proofs makes them a posteriori and leads down a road of quasi-empiricism. From my experience as a programmer and the difficulty of ensuring that code is accurate, I agree that assistance from a computer can sometimes feel like an experiment. However, Gontheir (2009) formally shown that there are no programming errors in the proof of the 4CT. But that doesn’t make the proof surveyable in Tymoczko’s sense; at least not by a human, only by the rational agency of a computer. But this is not unique to computer assisted proofs, most modern mathematical proofs are so extensive that many of them are not surveyable from first principles; the proofs rely on previous theorems and results. For many modern proofs, a single mathematician has no chance of surveying the whole proof and all the previous results it is based upon (and the results those results are based on, etc.). Mathematicians rely on modularity; they take previously proven statements as true and seldom question their validity. So surveying a proof is simply looking at the novel material and taking previous results as given, therefore a proof is surveyable not if a single rational agent can survey it, but if a collection of agents can do it. With my definition of surveyability that accounts for the modularity of mathematics, surveying the 4CT becomes much more tractable. Even though one mathematician has no chance of surveying all 1834 configurations considered by the computer, she can survey a single configuration and together 1834 mathematicians can survey the 4CT. Thus, I cannot accept Tymoczko’s argument that computer assisted proofs are necessarily a posteriari, but I agree that something smells fishy. For me, the issue is not general surveyability, but internalization. No mathematician fully understands the computational part of the proof, at least no more than a pointy-haired boss understands the task his engineers completed. Although some AI enthusiasts might argue that the computer understands its part of the proof, most mathematicians (or people in general) would be reluctant to admit computers as full participants in the “social, informal, intuitive, organic, human process” (De Millo, Lipton, & Perlisp, 1979; pg. 269) of mathematics. For De Millo, Lipton, & Perlisp (1979), PYTHIAGORA’s verification or the computer assisted part of a proof is simply meaningless; it does not contribute to the community’s understanding of mathematics. This is easiest to see in the odd Goldbach conjecture: what understanding do we gain from the $10^{30}$ odd numbers that Helfgott’s computer program checked? It teaches us no new mathematics, no new methods, and brings no extra understanding beyond a verification. In an alternative world without computer proofs, this verification would be absent. On the one hand, this means that alternative Helfgott would only tighten but not resolve the conjecture. On the other hand, the problem would remain open and continue to keep researchers motivated to find completely analytic ways to resolve it. Of course, even in our real world, a few mathematicians will continue looking for a non-computer assisted proof of the weak Goldbach conjecture, just as they continue to do with the four color theorem. However, the social motivation will be lower and progress slower. This is the curse of computing: giving up understanding for an easy verification. Of course, I don’t mean this to detract from Helfgott’s breakthrough, the real danger of the curse is not among mathematicians but among theorists and modelers. Instead of the mathematical reasoning and deep understanding encouraged by biologists, economists, and physicists of the past, theorists today are increasingly satisfied with computer simulations. In fields where the simulations can be directly linked to experiment (physics, chemistry, etc) this is largely alright. In biology, economics, and the social sciences, however, the link of simulation to experiment is weak or nonexistent. Models serve as heuristics to inform largely verbal theories. They are built to pump out surprising results but not to extend understanding. Ironically, the co-founder of the Santa Fe Institute (the home of agent-based modeling and complexology) Murray Gell-Mann best states the computing dilemma: What is the point of studying a complex system that we don’t understand by making a computer model that we don’t understand? My pessimistic answer is ‘it’s easy’, writing an ‘abstract’ simulation is much simpler than trying to pursue an analytic result. If the simulations were actually treated as heuristics and followed up by rigorous treatments then this ease would not be a problem. However, it seems that in some fields simulations are increasingly discouraging follow up work. Once a computational model hints at a previously paradoxical effect, it often spells doom for understanding. Follow up papers that extend beyond number crunching to show general results are faced by the wall of “this isn’t novel” and rejected. The curse of computing is settling for examples or easy verification at the expense of a slower but deeper understanding. As computer scientists start to rewrite their COMP-101 courses to remove the Goldbach conjecture as a motivation for the halting problem, I hope that in its place they will stress replacing heuristic models by rigorous understanding. ### References Appel, K., & Haken, W. (1977). Every planar map is four colorable. Part I. Discharging. Illinois J. Math 21: 429–490. Appel, K., Haken, W., & Koch, J. (1977). Every planar map is four colorable. Part II. Reducibility. Illinois J. Math 21: 491–567. Helfgott, H.A. (2012). Minor arcs for Goldbach’s problem. arXiv preprint 1205.5252 Helfgott, H.A. (2013). Major arcs for Goldbach’s problem. arXiv preprint 1305.2897 De Millo, R. A., Lipton, R. J., & Perlis, A. J. (1979). Social processes and proofs of theorems and programs. Communications of the ACM, 22(5), 271-280. Gonthier, G. (2008). The four colour theorem: Engineering of a formal proof. In Computer Mathematics . Springer Berlin Heidelberg. [preprint version] Rav, Y. (1999). Why Do We Prove Theorems? Philosophia Mathematica, 7 (1), 5-41 DOI: 10.1093/philmat/7.1.5 Tymoczko, T. (1979). The four-color problem and its philosophical significance. The Journal of Philosophy, 76(2): 57-83. ## Quasi-magical thinking and superrationality for Bayesian agents As part of our objective and subjective rationality model, we want a focal agent to learn the probability that others will cooperate given that the focal agent cooperates ($p$) or defects ($q$). In a previous post we saw how to derive point estimates for $p$ and $q$ (and learnt that they are the maximum likelihood estimates): $p_0 = \frac{n_{CC} + 1}{n_{CC} + n_{CD} + 2}$, and $q_0 = \frac{n_{DC} + 1}{n_{DC} + n_{DD} + 2}$ where $n_{XY}$ is the number of times Alice displayed behavior $X$ and saw Bob display behavior $Y$. In the above equations, a number like $n_{CD}$ is interpreted by Alice as “the number of times I cooperated and Bob ‘responded’ with a defection”. I put ‘responded’ in quotations because Bob cannot actually condition his behavior on Alice’s action. Note that in this view, Alice is placing herself in a special position of actor, and observing Bob’s behavior in response to her actions; she is failing to put herself in Bob’s shoes. Instead, she can realize that Bob would be interested in doing the same sort of sampling, and interpret $n_{CD}$ more neutrally as “number of times agent 1 cooperates and agent 2 defects”, in this case she will see that for Bob, the equivalent quantity is $n_{DC}$. With this simple theory of mind that realizes that Bob is also capable of agency, the proper interpretation for $p_\mathrm{ToM}$ ($q_\mathrm{ToM}$; I am introducing subscripts to avoid overloading notation) becomes “how often an agent cooperates given that the other agent cooperates (defects)”. The proper estimate becomes: $p_\mathrm{ToM} = \frac{2n_{CC} + 1}{2n_{CC} + n_{CD} + n_{DC} + 2}$, and $q_\mathrm{ToM} = \frac{n_{CD} + n_{DC} + 1}{n_{CD} + n_{DC} + 2n_{DD} + 2}$ Note that Alice is now double counting the symmetric events $CC$ and $DD$. If the real proportion of cooperation in the population is $r$ then in the limit of $n \rightarrow \infty$ the theory-of-mind estimates still converge to the rational $p_\mathrm{ToM}^* = r$ and $q_\mathrm{ToM}^* = r$. What if the agent employs quasi-magical thinking, i.e. know that their actions do not affect the decision of others but behave as if they do? If we follow Masel (2007) then quasi-magical thinking is equivalent to Alice updating her beliefs not only with the result of Bob’s action, but also her simulation of what she would have done. As an example, everytime she sees a $CC$ or $CD$ event, she also counts as $CC$ event since she would have cooperated in that case. This gives us the estimates: $p_{1/2} = \frac{2 n_{CC} + n_{CD} + 1}{ 2(n_{CC} + n_{CD}) + 2}$, and $q_{1/2} = \frac{ n_{DC} + 1}{ 2(n_{DC} + n_{DD}) + 2}$ As before, we look at the limit of $n \rightarrow \infty$ with true proportion of cooperation$r\$:

$p_{1/2}^* = \frac{r}{2} + \frac{1}{2}$ and $q_{1/2}^* = \frac{r}{2}$

In other words, regardless of the true probability of cooperation, this quasi-magical thinking agent will believe that her partner is 50% more likely to cooperate with her if she cooperates.

I chose the subscript $1/2$ suggestively. In a more general treatment, Alice can choose by how much she weighs her simulated self-action against Bob’s real action. If we let $\alpha$ be a measure of self-absorption or the weight she attributes to herself and $1 - \alpha$ to Bob then the estimates become:

$p_\alpha = \frac{n_{CC} + \alpha n_{CD} + 1}{ n_{CC} + n_{CD} + 2}$, and $q_\alpha = \frac{ (1 - \alpha)n_{DC} + 1}{ n_{DC} + n_{DD} + 2}$

With the limit of large $n$ given by:

$p_\alpha^* = (1 - \alpha)r + \alpha$ and $q_\alpha^* = (1 - \alpha)r$

In words, an $\alpha$-self-absorbed quasi-magical thinking agent will believe her partner to be $\alpha$ more likely to cooperate with her if she cooperates regardless of the real proportion of cooperation $r$. If we consider a completely self-absrobed agent ($\alpha = 1$) then we recover Hofstadter’s superrationality: an agent so dogmatic that he assumes that every other agents will always arrive at the same exact conclusion as him for what behavior to pursue.

If Alice suffers from the quasi-magical delusions, but otherwise acts rationally on her beliefs then we can calculate how she will behave. Suppose that she is $\alpha$-self-absrobed, thinks that she is playing the cooperate-defect game $\begin{pmatrix} 1 & U \\ V & 0 \end{pmatrix}$, and is out to maximize her expected utility. She will cooperate if:

$\alpha + (1 - \alpha)(r + (1 - r)U) > (1 - \alpha)rV$

A superrational agent $\alpha = 1$ will always cooperate. For other agents, cooperation will be risk dominant (equivalent to $r = 1/2$) strategy when:

$1 + 2 \frac{\alpha}{1 - \alpha} > V - U$

Note that by setting tag-mutation rate $\nu = \frac{\alpha}{1 - \alpha}$ we recover the condition for cooperation in set structured populations. Can we think of set-structured populations in cognitive terms as a type of quasi-magical thinking? A group-think delusion or just a mathematical coincidence?

Masel, J. (2007). A Bayesian model of quasi-magical thinking can explain observed cooperation in the public good game Journal of Economic Behavior & Organization, 64 (2), 216-231 DOI: 10.1016/j.jebo.2005.07.003

## Quasi-magical thinking and the public good

Cooperation is a puzzle because it is not obvious why cooperation, which is good for the group, is so common, despite the fact that defection is often best for the individual. Though we tend to view this issue through the lens of the prisoner’s dilemma, Artem recently pointed me to a paper by Joanna Masel, a mathematical biologist at Stanford, focusing on the public goods game [1]. In this game, each player is given 20 tokens and chooses how many of these they wish to contribute to the common pool. Once players have made their decisions, the pool is multiplied by some factor m (where mn > 1) and the pool is distributed equally back to all players. To optimize the group’s payoff, players should take advantage of the pool’s multiplicative effects by contributing all of their tokens. However, because a player’s share does not depend on the size of their contribution, it is easy to see that this is not the best individual strategy (Nash equilibrium). By contributing nothing to the common pool, a player gets a share of the pool in addition to keeping all of the tokens they initially received. This conflict captures the puzzle of cooperation, which in this case is: Why do human participants routinely contribute about half of their funds, if never contributing is individually optimal?

As Masel points out, various attempts at explaining human cooperation in this context have failed. The proposal that players have not understood the game is contradicted by the finding that cooperation perseveres when subjects play for extended periods, and even resets to high levels when a new round is started [2]. The argument that players cooperate in an attempt to evoke reciprocity also falls flat because subjects who play anonymously, and with no knowledge of their partners’ contributions, not only continue to cooperate, but do so at equal [3, 4] or even higher levels [5]. A final proposal of particular interest to us is the suggestion that players are using a utility function (i.e., considering a payoff matrix) that deviates from objective reality. In other words, they are considering subjective factors such as fairness, the group’s payoff, the rewarding nature of contributing, and so on.

To explain the data and yet stray as little as possible from the assumption of rationality, Masel proposes that human reasoning may be captured by the idea “what if everyone else thought like me?” Specifically, even though players understand there is no causal link between their own behavior and that of others, they may nevertheless recognize that a correlation exists, and this realization may be sufficient motivation to contribute. Famously proposed by Shafir and Tversky [6], this phenomenon is known as quasi-magical thinking and involves acting as if one erroneously believes (without actually believing) that one’s actions affect the behavior of others. This principle may best be captured by the sentiment often expressed by voters, who individually have very little influence on the outcome of any given election, “if I don’t vote, then who will?” In this case, players contribute because they are acting as if they believe that contributing makes others more likely to contribute.

Let’s just hope they think like me and cooperate.

(As an aside, Artem points out that this idea resembles Douglas Hofstadter’s concept of superrationality, a type of decision making where individuals assume that, in a symmetric game, both parties will arrive at the same answer. Because unilateral actions are off limits, this results in cooperation instead of defection, since cooperation is the best mutual strategy. The difference, in this case, is that players do not assume that others will mirror their actions; rather, they are simply sensitive to the fact that their behavior is likely to be correlated to some degree with the behavior of others.)

To test this idea, Masel considers agents using a Bayesian update scheme to estimate how much others contribute and how much these contributions vary. This would ordinarily result in a race to the bottom, with agents converging to the Nash equilibrium (no one contributing any of their tokens). Masel avoids this by having agents treat their own expected contribution as a data point akin to other players’ contributions. This expected contribution is weighted more heavily initially, while an agent’s confidence in its estimate of the average contribution is low, and becomes weighted less heavily relative to external data as time goes on and confidence grows. As a result, agents can increase their estimate of the average contribution simply by expecting to contribute more themselves, particularly when not enough reliable data has been collected to disagree.

In principle, such a bias seems reasonable. It would encourage cooperation, despite cooperation not being individually optimal, and avoids strongly violating the assumption of rationality by explaining the tendency to cooperate as a consequence of what data is used to predict others’ behavior. This broadly agrees with the finding that making choices influences expectations [7] and, conversely, that estimating others’ actions prior to making a choice leads to reduced contributions [8]. In short, leveraging the knowledge that “I am like them” may explain, in rational terms, seemingly irrational cooperation in the public goods game.

### References

1. Masel, J. (2007). A Bayesian model of quasi-magical thinking can explain observed cooperation in the public good game. Journal of Economic Behavior & Organization, 64 (2), 216-231 DOI: 10.1016/j.jebo.2005.07.003
2. Isaac, R. M., Walker, J. M., & Thomas, S. H. (1984). Divergent evidence on free riding: An experimental examination of possible explanations. Public Choice, 43(2), 113-149. doi: 10.1007/bf00140829
3. Brandts, J., & Schram, A. (2001). Cooperation and noise in public goods experiments: applying the contribution function approach. Journal of Public Economics, 79(2), 399-427. doi: 10.1016/s0047-2727(99)00120-6
4. Weimann, J. (1994). Individual behaviour in a free riding experiment. Journal of Public Economics, 54(2), 185-200. doi: 10.1016/0047-2727(94)90059-0
5. Andreoni, J. (1988). Why free ride? Journal of Public Economics, 37(3), 291-304. doi: 10.1016/0047-2727(88)90043-6
6. Shafir, E., & Tversky, A. (1992). Thinking through uncertainty: Nonconsequential reasoning and choice. Cognitive Psychology, 24(4), 449-474. doi: 10.1016/0010-0285(92)90015-t
7. Dawes, R. M., McTavish, J., & Shaklee, H. (1977). Behavior, communication, and assumptions about other people’s behavior in a commons dilemma situation. Journal of Personality and Social Psychology35(1), 1-11.
8. Croson, R. T. A. (2000). Thinking like a game theorist: factors affecting the frequency of equilibrium play. Journal of Economic Behavior & Organization, 41(3), 299-314. doi: 10.1016/s0167-2681(99)00078-5