Rogers’ paradox: Why cheap social learning doesn’t raise mean fitness

It’s Friday night, you’re lonely, you’re desperate and you’ve decided to do the obvious—browse Amazon for a good book to read—when, suddenly, you’re told that you’ve won one for free. Companionship at last! But, as you look at the terms and conditions, you realize that you’re only given a few options to choose from. You have no idea what to pick, but luckily you have some help: Amazon lets you read through the first chapter of each book before choosing and, now that you think about it, your friend has read most of the books on the list as well. So, how do you choose your free book?

If you answered “read the first chapter of each one,” then you’re a fan of asocial/individual learning. If you decided to ask your friend for a recommendation, then you’re in favor of social learning. Individual learning would probably have taken far more time here than social learning, which is thought to be a common scenario: Social learning’s prevalence is often explained in terms of its ability to reduce costs—such as metabolic, opportunity or predation costs—below those incurred by individual learning (Aoki et al., 2005; Kendal et al., 2005; Laland, 2004). However, a model by Rogers (1988) famously showed that this is not the whole story behind social learning’s evolution.
Read more of this post

Advertisements

Baldwin effect and overcoming the rationality fetish

G.G. Simpson and J.M. Baldwin

G.G. Simpson and J.M. Baldwin

As I’ve mentioned previously, one of the amazing features of the internet is that you can take almost any idea and find a community obsessed with it. Thus, it isn’t surprising that there is a prominent subculture that fetishizes rationality and Bayesian learning. They tend to accumulate around forums with promising titles like OvercomingBias and Less Wrong. Since these communities like to stay abreast with science, they often offer evolutionary justifications for why humans might be Bayesian learners and claim a “perfect Bayesian reasoner as a fixed point of Darwinian evolution”. This lets them side-stepped observed non-Bayesian behavior in humans, by saying that we are evolving towards, but haven’t yet reached this (potentially unreachable, but approximable) fixed point. Unfortunately, even the fixed-point argument is naive of critiques like the Simpson-Baldwin effect.

Introduced in 1896 by psychologist J.M. Baldwin then named and reconciled with the modern synthesis by leading paleontologist G.G. Simpson (1953), the Simpson-Baldwin effect posits that “[c]haracters individually acquired by members of a group of organisms may eventually, under the influence of selection, be reenforced or replaced by similar hereditary characters” (Simpson, 1953). More explicitly, it consists of a three step process (some of which can occur in parallel or partially so):

  1. Organisms adapt to the environment individually.
  2. Genetic factors produce hereditary characteristics similar to the ones made available by individual adaptation.
  3. These hereditary traits are favoured by natural selection and spread in the population.

The overall result is that originally individual non-hereditary adaptation become hereditary. For Baldwin (1886,1902) and other early proponents (Morgan 1886; Osborn 1886, 1887) this was a way to reconcile Darwinian and strong Lamarkian evolution. With the latter model of evolution exorcised from the modern synthesis, Simpson’s restatement became a paradox: why do we observe the costly mechanism and associated errors of individual learning, if learning does not enhance individual fitness at equilibrium and will be replaced by simpler non-adaptive strategies? This encompass more specific cases like Rogers’ paradox (Boyd & Richerson, 1985; Rogers, 1988) of social learning.
Read more of this post

Algorithmic view of historicity and separation of scales in biology

A Science publications is one of the best ways to launch your career, especially if it is based on your undergraduate work, part of which you carried out with makeshift equipment in your dorm! That is the story of Thomas M.S. Chang, who in 1956 started experiments (partially carried out in his residence room in McGill’s Douglas Hall) that lead to the creation of the first artificial cell (Chang, 1964). This was — in the words of the 1989 New Scientists — an “elegantly simple and intellectually ambitious” idea that “has grown into a dynamic field of biomedical research and development.” A field that promises to connect biology and computer science by physically realizing John von Neumann’s dream of a self-replication machine.

makingBilayer
Read more of this post

Learning and evolution are different dynamics

A couple of weeks ago, if you randomly woke me in the middle of the night and demanded to know the fundamental difference between evolution and learning as adaptive processes, I would probably respond: “how did you get into my house? and umm… I guess they are mostly the same, it is just a matter of time-scales and domain.” This answer stems from my urge to generalize and find the overarching similarities between ideas, and evolution and learning share a lot in common. Both are more likely to propagate effective behaviors than ineffective and both generate novelty in randomized and often unguided process: mutation and innovation. In fact, in evolutionary game theory social imitation and reproduction are used almost interchangeably in mathematical models. Most computational models can be interpreted either as biological or cultural evolution without changing any code, just the words used to describe the agents.

In the Hammond & Axelrod (2006) model of ethnocentrism, for example, we can stretch the whole range of biological to cultural evolution depending on our interpertation:

  • If we interpret the agents as single bacteria and the tags are quorum markers, then we are obviously in the standard green-beard effect regime and our evolution can only be interpreted as biological.
  • If we interpret the agents as humans (or other animals) and tags as skin color (or other physical trait) then our strategy transmission might be biological or cultural, but the tag transmission is clearly biological.
  • If we interpret the agents as humans and tags as language accents, then both transmissions are cultural with only a little room to argue for biology.
  • Finally, if we interpret agents as villages and tags as their religion then it is almost impossible to argue for biology and the dynamics must clearly be of cultural evolution.
  • But, we never changed any specifications of the model, just the language we used to describe it so the dynamics were invariant. I usually view this generality as an advantage of the model; we can reason about either dynamic: cultural or biological. However, it can also be a weakness, the dynamics are underspecified and inaccurate representations of both!

    From a practical point of view, if I want to combine evolution and learning in one model then it doesn’t make sense to do so (and expect anything interesting) if they follow the same exact dynamics. Since I am becoming more interested in social learning and its potential analogies to evolutionary game theory, it is important to figure out what fundamental differences the two adaptive process might have. Thankfully, evolutionary economists have already thought about this.

    For an evolutionary economist: the agents are corporations and the heritable material is business-practices. In domain they are squarely working with learning and cultural evolution, but they view the resulting dynamics as analogous to the biology from which they borrow name. Since agent-based modeling is an important methodology for these economists, they have thought about the similarities and differences of evolutionary and learning models carefully.

    Brenner (1998) explicitly compares models of evolutionary and learning. For evolution, he takes the EGT model of replicator-mutator dynamics, and for learning he looks at his earlier Variation-Imitation-Decision (VID) model (Brenner 1996). Since the VID model doesn’t seem to be a standard approach, I won’t go into the details of the technical comparison. I will instead highlight the distinction Brenner draws that I think generalize to most models of evolution and learning: objectivity of fitness.

    In a biological settings, we have a clear objective measure of fitness: number of offspring. As such, it is relatively uncontroversial to associate a given behavior with a fitness value. In a lot of social learning settings, the same approach is also followed, but it is not as obvious. The fitness of a meme is subjective and varies between potential adopters. Some agents might be more susceptible to a given idea given the ideas they already hold, their past history with various behaviors (invidiual historicity), or just general outlook; other agents might be less so. Two agents might observe the same behavior, and the first might think the behavior is good (and thus maybe worth imitating) and another will conclude that it is not a helpful behavior (and thus probably not worth imitating). In a general social settings, we cannot view a heritable trait as having an inherent fitness, it depends on the agents that will consider it for copying.

    If we wanted to incorporate a lack of objective fitness into an EGT model, we could do this in the objective versus subjective rationality model. In this model, each agent has a different subjective conception of what game the objective game of the environment is. As such, if Alice and Bob views the behavior of Eve then they will judge its effectiveness not by Eve’s conception of the game (that they doesn’t know) but by their own, as such Alice might calculate one utility for the behavior she saw Eve display, and Bob could calculate a completely different utility. From the point of view of imitation, Eve’s behavior would not have an inherent fitness. At the same time, the obj-vs-subj model also has elements of standard evolution (in the vertical transmission of conceptions of the game) and can be a good groundwork for building models that capture the different dynamics of evolution and learning.

    Now, if you break into my house in the middle of the night to question me about evolution and learning then — while I wait for the cops to come remove you — I might explain the importance of objective versus subjective measures of fitness.

    References

    Brenner, T. (1996) Learning in a repeated decision process: A mutation-imitation-decision model. Papers on Economics and Evolution #9603, Max-Planck-Institut, Jena.

    Brenner, T. (1998). Can evolutionary algorithms describe learning processes? Journal of Evolutionary Economics, 8 (3), 271-283 DOI: 10.1007/s001910050064

    Hammond, R., & Axelrod, R. (2006). The Evolution of Ethnocentrism. Journal of Conflict Resolution, 50(6): 926-936

Social learning dilemma

Last week, my father sent me a link to the 100 top-ranked specialties in the sciences and social sciences. The Web of Knowledge report considered 10 broad areas[1] of natural and social science, and for each one listed 10 research fronts that they consider as the key fields to watch in 2013 and are “hot areas that may not otherwise be readily identified”. A subtle hint from my dad that I should refocus my research efforts? Strange advice to get from a parent, especially since you would usually expect classic words of wisdom like: “if all your friends jumped off a bridge, would you jump too?”


And it says a lot about you that when your friends jump off a bridge en masse, your first thought is apparently 'my friends are all foolish and I won't be like them' and not 'are my friends okay?'.

So, which advice should I follow? Should I innovate and focus on my own fields of interest, or should I imitate and follow the trends? Conveniently, the field best equipped to answer this question, i.e. “social learning strategies and decision making”, was sixth of the top ten research fronts for “Economics, Psychology, and Other Social Sciences”[2].

For the individual, there are two sides to social learning. On the one hand, social learning is tempting because it allows agents to avoids the effort and risk of innovation. On the other hand, social learning can be error-prone and lead individuals to acquire inappropriate and outdated information if the the environment is constantly changing. For the group, social learning is great for preserving and spreading effective behavior. However, if a group has only social learners then in a changing environment it will not be able to innovate new behavior and average fitness will decrease as the fixed set of available behaviors in the population becomes outdated. Since I always want to hit every nail with the evolutionary game theory hammer, this seems like a public goods game. The public good is effective behaviors, defection is frequent imitation, and cooperation is frequent innovation.

Although we can trace the study of evolution of cooperation to Peter Kropotkin, the modern treatment — especially via agent-based modeling — was driven by the innovative thoughts of Robert Axelrod. Axelrod & Hamilton (1981) ran a computer tournament where other researchers submitted strategies for playing the iterated prisoners’ dilemma. The clarity of their presentation, and the surprising effectiveness of an extremely simple tit-for-tat strategy motivated much of the current work on cooperation. True to their subject matter, Rendell et al. (2010) imitated Axelrod and ran their own computer tournament of social learning strategies, offering 10,000 euros for the best submission. By cosmic coincidence, the prize went to students of cooperation: Daniel Cownden and Tim Lillicrap, two graduate students at Queen’s University, the former a student of mathematician and notable inclusive-fitness theorist Peter Taylor.

A restless multi-armed bandit served as the learning environment. The agent could select which of 100 arms to pull in order to receive a payoff drawn independently (for each arm) from an exponential distribution. It was made “restless” by changing the payoff after each pull with probability p_C. A dynamic environment was chosen because copying outdated information is believed to be a central weakness of social learning, and because Papadimitriou & Tsitsiklis (1999) showed that solving this bandit (finding an optimal policy) is PSPACE-complete[3], or in laymen terms: very intractable.

Participants submitted specifications for learning strategies that could perform one of three actions at each time step:

  • Innovate — the basic form of asocial learning, the move returns accurate information about the payoff of a randomly selected behavior that is not already known by the agent.
  • Observe — the basic form of social learning, the observe move returns noisy information about the behavior and payoff being demonstrated by a randomly selected individual. This could return nothing if no other agent played an exploit move this round, or if the behavior was identical to one the focal agent already knows. If some agent is selected for observation then unlike the perfect information of innovate, noise is added: with probability p_\text{copyActWrong} a randomly chosen behavior is reported instead of the one performed by the selected agent, and the payoff received is reported with Gaussian noise with variance \sigma_\text{copyPayoffError}.
  • Exploit — the only way to acquire payoffs by using one of the behaviors that the agent has previously added to its repertoire with innovate and observe moves. Since no payoff is given during innovate and observe, they carry an inherent opportunity cost of not exploiting existing behavior.

The payoffs were used to drive replicator dynamics via a death-birth process. The fitness of an agent was given by their total accumulated payoff divided by the number of rounds they have been alive for. At each round, every agent in the population had a 1/50 probability of expiring. The resulting empty spots were filled by offspring of the remaining agents, with probability of being selected for reproduction proportional to agent fitness. Offspring inherited their parents’ learning strategy, unless a mutation occurred, in which case the offspring would have the strategy of a randomly selected learning strategy from those considered in the simulation.

A total of 104 learning strategies were received for the tournament. Most were from academics, but three were from high school students (with one placing in the top 10). A pairwise tournament was held to test the probability of a strategy invading any other strategy (i.e, if a single individual with a new strategy is introduced into a homogeneous population of another strategy).This round-robin tournament was used to select the 10 best strategies for advancement to the melee stage. During the round-robin p_C, p_\text{copyActWrong}, \sigma_\text{copyPayoffError} were kept fixed, only during the melee stage with all of the top-10 strategies present did the experimenters vary these parameters.

Mean score of the 104 learning sttrategies depending on the proportion of learning actions (both INNOVATE and OBSERVE) in the left figure, and the proportion of OBSERVE actions in the right figure. These are figures 2A and 2C from Rendell et al. (2010).

Mean score depending the proportion of learning actions (both INNOVATE and OBSERVE) in the left figure, and the proportion of OBSERVE actions in the right figure. These are figures 2C and 2A from Rendell et al. (2010).

Unsurprisingly using lots of EXPLOIT moves is essential to good performance, since this is the only way to earn payoff. In other words: less learning and more doing. However, a certain minimal amount of learning is needed to get your doing off the ground, of this learning there is a clear positive correlation between the amount of social learning and success in invading other strategies. The best strategies used the limited information given to them to estimate p_C and used that to better predict and quickly react to changes in the environment. However, they also relied completely on social learning, waiting for other agents to innovate new strategies or for p_\text{copyActWrong} to accidently give a new behavior for their repertoire. Since evolution (unlike the classical assumptions of rationality) cares about relative and not absolute payoffs, it didn’t matter to these agents that they were not doing as well as they could be, as long as they were doing as well as (or better than) their opponents[4]. OBSERVE moves and a good estimate of environmental change allowed the agents to minimize their number of non-EXPLOIT moves and since their exploits paid as well as their opponents (who they were copying) they ended up having equal or better payoff (due to less learning and more exploiting).

Average individual fitness of the top 10 strategies when in a homogenous environment. The best strategy from the multi-strategy competitions is on the left and the tenth best is on the right. Note that the best strategies for when all 10 strategies are present are the worst for when they are alone.

Average individual fitness of the top 10 strategies when in a homogenous environment. The best strategy from the multi-strategy competitions is on the left and the tenth best is on the right. Note that the best strategies for when all 10 strategies are present are the worst for when they are alone. This is figure 1D from Rendell et al. (2010).

My view of social learning as an antisocial strategy is strengthened by the strategy’s low fitness when in isolation. The figure to the left shows this result, with the data-points more to the left corresponding to strategies that did better in the melee. Strategies 1, 2, and 4 are the pure social learners. The height of the data points shows how well a strategy performed when faced only against itself. The strategies that did best in the heterogeneous setting of the 10 strategy melee performed the worst when they were in a homogeneous populations with only agents of the same type. This is in line with Rendell, Fogarty, & Laland (2010) observation that social learning can decrease the overall fitness of the population. Social learners fare even worse when they can’t make occasional random mistakes in copying behavior, without these errors all innovation disappears from the population and average fitness plummets. Social learners are free-riding on the innovation of asocial agents.

I would be interested in pursuing this heuristic connection between learning and social dilemmas further. The interactions of learners with each other and the environment can be seen as an evolutionary game: can we calculate the explicit payoff matrix of this game in terms of environmental and strategy parameters? Does this game belong to the Prisoners’ dilemma or Hawk-Dove (or other) region of cooperate-defect games? The heuristic view of innovation as a public good and the lack of stable co-existence of imitators and innovators suggests that the dynamics are PD. However, Rendell, Fogarty, & Laland (2010) show social learning can sometimes spread better on a grid structure, this is contrary to the effects of PD on grids, but consistent with observations for HD (Hauert & Doebeli, 2004). Since the two studies use very different social learning strategies, it could be the case that depending on parameters, we can achieve either PD or HD dynamics.

Regardless of which social dilemma is in play, we know that slight spatial structure enhances cooperation. This means that I expect that if — instead of inviscid interactions — I repeated Rendell et al. (2010) on a regular random graph then we would see more innovation. Similarly, if we introduced selection on the level of groups then groups with more innovators would fare better and spread the innovative strategy throughout the population.

So what does this mean for how I should take my father’s implicit advice? First: stop learning and start doing; I need to spend more time writing up results into papers instead of learning new things. Unfortunately for you, my dear reader, this could mean fewer blog posts on fun papers and more on my boring work! In terms of following research trends, or innovating new themes, I think a more thorough analysis is needed. It would be interesting to extend my preliminary ramblings on citation network dynamics to incorporate this work on social learning. For now, I am happy to know that at least some of things I’m interested are — in Twitter speak — trending.

Notes and References

  1. Way too broad for my taste, one category was “Mathematics, Computer Science, and Engineering”; talk about a tease-and-trick. After reading the first two items I was excited to see a whole section dedicated to results like theoretical computer science, only to have my dreams dashed by ‘Engineering’. Turns out that Thomson Reuters and I have very different ideas on what ‘Mathematics’ means and how it should be grouped.
  2. Note that my interest weren’t absent from the list, with “financial crisis, liquidity, and corporate governance” appearing tenth for “Economics, Psychology, and Other Social Sciences” and even selected for a special more in-depth highlight. Evolutionary thinking also appeared in tenth place for the poorly titled “Mathematics, Computer Science and Engineering” area as “Differential evolution algorithm and memetic computation”. It is nice to know that these topics are popular, although I am usually not a fan of the engineering approach to computational models of evolution since their goal is to solve problems using evolution, not answer questions about evolution.
  3. High-impact general science publications like Nature, Science, and their more recent offshoots (like the open-access Scientific Reports) are awful at presenting theoretical computer science. It is no different in this case, Papadimitriou and Tsitsiklis (1999) is a worst-case result that requires more freedom in the problem instances to encode the necessary structure for a reduction to known hard problems. Although their theorem is about restless bandits, the reduction needs a more general formulation in terms of arbitrary deterministic finite-dimensional Markov chains instead of the specific distributions used by Rendell et al. (2010). I am pretty sure that the optimal policy for the obvious generalization (i.e. n arms instead of 100, but generated in the same way) of the stochastic environment can be learned efficiently; there is just not enough structure there to encode a hard problem. Since I want to understand multi-armed bandits better, anyways, I might find the optimal algorithm and write about it in a future post.
  4. This sort of “I just want to beat you” behavior, reminds me of the irrational defection towards the out-group that I observed in the harmony game for tag-based models (Kaznatcheev, 2010).

Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science, 211(4489), 1390-1396.

Hauert, C., & Doebeli, M. (2004). Spatial structure often inhibits the evolution of cooperation in the snowdrift game. Nature, 428(6983), 643-646.

Kaznatcheev, A. (2010). Robustness of ethnocentrism to changes in inter-personal interactions. Complex Adaptive Systems – AAAI Fall Symposium. (pdf)

Papadimitriou, C. H., & Tsitsiklis, J. N. (1999). The complexity of optimal queuing network control. Mathematics of Operations Research, 24(2): 293-305.

Rendell L, Boyd R, Cownden D, Enquist M, Eriksson K, Feldman MW, Fogarty L, Ghirlanda S, Lillicrap T, & Laland KN (2010). Why copy others? Insights from the social learning strategies tournament. Science, 328 (5975), 208-213 PMID: 20378813

Rendell, L., Fogarty, L., & Laland, K. N. (2010). Rogers’ paradox recast and and resolved: population structure and the evolution of social learning strategies” Evolution 64(2): 534-548.