Baldwin effect and overcoming the rationality fetish
October 6, 2013 20 Comments
Introduced in 1896 by psychologist J.M. Baldwin then named and reconciled with the modern synthesis by leading paleontologist G.G. Simpson (1953), the Simpson-Baldwin effect posits that “[c]haracters individually acquired by members of a group of organisms may eventually, under the influence of selection, be reenforced or replaced by similar hereditary characters” (Simpson, 1953). More explicitly, it consists of a three step process (some of which can occur in parallel or partially so):
- Organisms adapt to the environment individually.
- Genetic factors produce hereditary characteristics similar to the ones made available by individual adaptation.
- These hereditary traits are favoured by natural selection and spread in the population.
The overall result is that originally individual non-hereditary adaptation become hereditary. For Baldwin (1886,1902) and other early proponents (Morgan 1886; Osborn 1886, 1887) this was a way to reconcile Darwinian and strong Lamarkian evolution. With the latter model of evolution exorcised from the modern synthesis, Simpson’s restatement became a paradox: why do we observe the costly mechanism and associated errors of individual learning, if learning does not enhance individual fitness at equilibrium and will be replaced by simpler non-adaptive strategies? This encompass more specific cases like Rogers’ paradox (Boyd & Richerson, 1985; Rogers, 1988) of social learning.
The easiest way to get around the Simpson-Baldwin effect is to postulate that selectively significant environmental change is happening on a time-scale of a single organism’s lifespan. This justification fares well in the age of industrialization and drastic human generated environmental change; it can even apply to other animals — like fishes — where humans cause a strong coupling between the ecological and evolutionary dynamics. However, it seems less relevant to the paleolithic human, and yet we attribute unique cognitive flexibility and learning to the development of early stone tools (although the uniqueness might be more physical and less mental) that characterize that period. Further, we usually attribute our learning abilities to our bigger brains and those are best explained by the social brain hypothesis. This suggests that we should turn our attention from external environmental pressures, to internal (to the species) social pressures. In other words, we should be using evolutionary game theory to look at frequency-dependent selection.
Smead & Zollman (2009) considered general strategic plasticity (learning would be a specific type) in a game theoretic setting. Given an arbitrary symmetric evolutionary game G, they introduce a new strategy L that learns how to play best-response dynamics to any strategy in G and a Nash equilibrium against itself but carries a small cognitive cost c; the resulting game is GL. They show that for all games G without a Pareto dominant mixed strategy Nash equilibrium (i.e. for any generalized social dilemma game), L is not an evolutionary stable strategy in GL. Not being an ESS means that we will never find a monomorphic population of learners, but it doesn’t mean that learners can’t exist as a fraction of a polymorphic populations — this is what we usually expect of social learners, for instance. However, they show that even this is possible only for Hawk-Dove like games where every pure strategy does poorly against itself. Specifically, if there exists a strategy s in G such that it is a best response to itself then L will go to extinction. Even if the cost of learning is not associated with development, but instead depends on the opponent. In such a setting, for L to be an ESS it must be the case that in a world of learners, it is more costly to be a non-learner than a learner. This might be possible for imitation, where it is cheaper than innovation, but it is typically not the case for more sophisticated types of learning.
As an ultimate relaxation, Smead & Zollman (2009) make the mechanism for strategic plasticity free with the only costs coming from mistakes during learning (nobody is perfect before they see any data). In particular, let be the proportion of interactions where L makes sub-optimal decisions, then among two-strategy games, strategic plasticity is an ESS only for games of the form where . That is part of the assurance and prisoner’s dilemma section of cooperate-defect games. Further, to achieve this ESS the learner needs a pretty sophisticated theory of mind: the strategy needs to respond not to what other learners are doing, but what they are ‘trying’ to do.
To study the Simpson-Baldwin effect, Smead & Zollman (2009) extend beyound their analysis of evolutionary stability and consider dynamics. They simulate the discrete-time replicator equation in an inviscid population, and show that starting from a point far from equilibrium, the proportion of strategically plastic agents increases drastically as they are initially selected for in assurance and PD games. However, once they make up a large fraction of the population, the non-learning strategies achieve higher fitness and drive the adaptable agents to extinction. This is exactly what Simpson’s three step recipe predicts. For Hawk-Dove dynamics, the effect is much less drastic, but sometimes present. Thus, for most games and reasonable costs of cognition, frequency-dependent selection is not sufficient to generate strategic plasticity at equilibrium.
What if learning is completely free, and there are many learning strategies present? What if the learner doesn’t have to respond to individual opponents but can learn a global strategy? At least in this setting, can we use evolutionary game theory to justify rational learning rules? Harley (1981) and Maynard Smith (1982) thought so, and argued that the only evolutionary stable learning rules are ones that generate ESS behaviour. Smead (2012) shows that this is not typically the case. He assumes that learning is fast relatively to reproduction, and considers rules like Bayesian learning that are behaviourly persistent. A population of learners is behaviourally persistent if introducing a few mutants does not drastically change the behaviour of the focal population. Bayesian learning would satisfy this since if only a few mutants are introduced then interactions with them are infrequent and thus the agents beliefs about the world are barely changed from these few samples. Smead (2002) shows that if a population is an evolutionary stable state then it is not behaviourally persistent. The most interesting result for me, though, is that if you slow the rate of learning then you can make the learner stable against any invader that leads away from equilibrium (although an agents that expresses the equilibrium behavioral proportions without learning can still invade). I think it is relatively surprising to see a slower learning rate resulting in higher resilience to invasion.
The above invasions usually happen by neutral drift, and so behaviourally Nash equilibrium (NE) is still expressed, even if it is not evolutionary stable. Smead (2012) summarizes this nicely:
This means that there is reason to expect a population to play according to a NE, but that the learning rules used by individuals may not (on their own) take the population to a NE. Consequently, pointing to the evolutionary success of equilibrium-learning rules does not clearly support equilibrium behaviour any more than more basic evolutionary process that act on behaviour directly.
So, to answer the rationality fetish: no, Bayesian learning is not a fixed-point of Darwinian evolution. At least it is not in the most basic and intuitive model of inviscid dynamics. However, in the special case of social learning and Rogers’ paradox, it is known that spatial structure can help learning strategies enhance the fitness of their bearers (Rendell et al., 2010). As such, it might be possible to achieve stability of general learning (maybe even under cognitive costs) in spatial populations. Of course, this isn’t the only way around the problem. We could also follow Valiant (2009) and express machine learning and evolution in the same framework, and show that evolution is a strict subclass of general PAC-learning and thus some behaviours are learnable but not evolvable. In terms of Simon’s three steps, this would be equivalent to saying that step two is not always possible. My preferred option, though, would be to look at rational learning not as superior to memorized strategies, but as a make-do mechanisms for when there are two many basic strategies to memorize. This would give us the perplexing view of rational learning as a consequence of constraints on our descriptive complexity and an artifact of bounded rationality.
Baldwin, J.M. (1886). A new factor in evolution. Amer. Nat., 30: 441-451, 536-553.
Baldwin, J.M. (1902). Development and evolution. Macmillan, New York.
Boyd, R., & Richerson, P.J. (1985). Culture and the evolutionary process. Chicago Univ. Press, Chicago.
Harley, C.B. (1981). Learning the evolutionary stable strategy. Journal of Theoretical Biology, 89: 611-633.
Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge University Press, Cambridge.
Morgon, C.L. (1886). Habit and instinct. Arnold, London.
Osborn, H.F. (1896). A mode of evolution requiring neither natural selection nor the inheritance of acquired traits. Trans. New York Acad. Sci., 15: 141-142, 148.
Osborn, H.F. (1887). Organic selection. Science, 583-587.
Rendell, L., Fogarty, L., & Laland, K. N. (2010). Rogers’ paradox recast and and resolved: population structure and the evolution of social learning strategies. Evolution, 64(2): 534-548.
Rogers, A. (1988). Does biology constrain culture? Am. Anthropol. 90.
Simpson, G.G. (1953). The Baldwin effect. Evolution, 7(2): 110-117.
Smead, R., & Zollman, K. J. (2009). The stability of strategic plasticity. Carnegie Mellon University, Department of Philosophy, Technical Report 182.
Smead, R. (2012). Game theoretic equilibria and the evolution of learning. Journal of Experimental & Theoretical Artificial Intelligence, 3 (24), 301-313 DOI: 10.1080/0952813X.2012.695444
Valiant, L.G. (2009) Evolvability. Journal of the ACM 56(1): 3