Short history of iterated prisoner’s dilemma tournaments

Nineteen Eighty — if I had to pick the year that computational modeling invaded evolutionary game theory then that would be it. In March, 1980 — exactly thirty-five years ago — was when Robert Axelrod, a professor of political science at University of Michigan, published the results of his first tournament for iterated prisoner’s dilemma in the Journal of Conflict Resolution. Game theory experts, especially those specializing in Prisoner’s dilemma, from the disciplines of psychology, political science, economics, sociology, and mathematics submitted 14 FORTRAN programs to compete in a round-robin tournament coded by Axelrod and his research assistant Jeff Pynnonen. If you want to relive these early days of evolutionary game theory but have forgotten FORTRAN and only speak Python then I recommend submitting a strategy to an analogous tournament by Vincent Knight on GitHub. But before I tell you more about submitting, dear reader, I want to celebrate the anniversary of Axelrod’s paper by sharing more about the original tournament.

Maybe it will give you some ideas for strategies.

Axelrod’s tournaments and TIT-for-TAT

Although something resembling game theory can be traced back almost as far as the birth of probability in the late 17th century, it is only with von Neumann’s 1928 “Zur Theorie der Gesellschaftsspiele” — which by 1944 with the addition of Morgenstern became the classic book Theory of Games and Economic Behavior — that game theory was born as a distinct discipline. By the 70s, game theory had a firm grip on economics, political science, and psychology; and its paradigmatic example of the Prisoner’s dilemma had become prominent in the multidisciplinary branch of the field. Everybody wanted to figure out how to ensure cooperation in the PD, and seeing the single-shot case as clearly hopeless, Axelrod turned to the iterated game. Agents would interact repeatedly with the same partner 200 times, simultaneous deciding — based on the history of their interaction — if to cooperate or defect on that turn.

In an early precursor to crowd-sourcing, Axelrod asked his colleagues to submit strategies for a round-robin tournament with the hope of finding out the secret to cooperation.

That secret was revealed by Anatol Rapoport, who was finishing his tenure as a professor of mathematics and psychology at the University of Toronto, with a 4 line FORTRAN program that implemented TIT-for-TAT. On the first turn the agent would cooperate, and on each subsequent turn it would repeat the move of its partner from the turn before. Axelrod’s tournament was set up so that mutual cooperation would yield 3 points, mutual defection 1 point, exploiting a cooperator would yield 5 points and being exploited would receive 0. Thus, the most points an agent could receive during a pair’s 200 interactions is 1000 (for example, ALL-D exploiting ALL-C), the least is 0 (for example, ALL-C being exploited by ALL-D), constant mutual cooperation would yield 600 (for example, ALL-C with ALL-C or TIT-for-TAT with itself), and constant mutual defection would yield 200 (for example, ALL-D with ALL-D). From the round-robin competition of 5 matches against each of the 13 other strategies and RANDOM (D or C at each turn with equal probability), TIT-for-TAT averaged 504.5 points, edging out the second place — a complicated variant of TIT-for-TAT — by 4.1 points, and beating the worst-performer — RANDOM — by 228.2 points.

Contrary to popular belief, TIT-for-TAT did not make its premier in 1980. It was already an established strategy by then. For example, Oskamp (1971) used it as one of several computer opponent for human players of iterated PD — it encourage a good amount of cooperation from human participants. It was not new to Axelrod, either, since he included it in a preliminary tournament — where it was the second best performer — before inviting the contributors. In fact, since all the contributors were briefed on the preliminary tournament, they were aware of TIT-for-TATs performance and many tried to improve on it for their submission. If you look at the detailed descriptions of the strategies, you will see that many are complicated variants of TIT-for-TAT.

In his analysis, Axelrod attributed TIT-for-TATs success to two — provocatively named — properties: niceness and forgiveness. In the first tournament, all the successful strategies were nice at the start — they cooperated on the first turn — and were forgiving — they tended to resume cooperation after their partner returned to cooperating. Unfortunately, the provocative terminology continues to resonate today, with TIT-for-TAT being touted as an explanation for all kinds of things that are probably not all that well modeled by an iterated PD.

For an amusing example, see “how nice guys undoubtedly finish first” because of “scientific research and evidence” à la Axelrod and TIT-for-TAT:

In September, Axelrod (1980b) followed up with his paper on a second tournament. The tournament had an identical round-robin style and payoff matrix, but instead of a fixed length of 200 rounds, it had a probability of 0.00346 of ending on any given round. This time there were 62 entrants, and even though everyone had a report on the results of the first tournament and could enter any strategy; Anatol Rapoport was the only one to (re)submit TIT-for-TAT and won the tournament again. Axelrod also extended past his previous paper by considering 6 variant tournaments where the interaction ecology was changed by having a particular strategy over-abundant. In this manipulation, TIT-for-TAT still came out on top 5 out of the 6 times, coming in 2nd in the last variant. Finally, although Axelrod did briefly discuss E.coli and thus elude to evolution in his first tournament, it wasn’t until this September paper that he considered evolutionary dynamics of the strategies in repeated tournaments. In fact, he implemented discrete time replicator dynamics — without calling them as such — and showed that after 1000 generations, TIT-for-TAT was leading the pack with 14.5% of the population and the highest continued growth rate at 0.05% per generation.

However, real fame for TIT-for-TAT and Robert Axelrod would come in 1981 when he was joined by William D. Hamilton, an evolutionary biologist at University of Michigan and the founder of inclusive fitness theory (Hamilton, 1964), and published the Science paper based on his tournaments that immortalized reciprocity as a mechanism for the evolution of cooperation.

Knight’s tournament

Vincent Knight is running a tournament similar to Axelrod’s first tournament (1980a). The key difference is that instead of maximizing the payoff, in Knight’s variant you want to minimize your payoff. In this version, mutual cooperation yields 2 points, mutual defection 4 point, exploiting a cooperator would yield 0 points and being exploited would receive 5. Note that if you want to translate between Axelrod’s maximization version and Knight’s minimization version, just subtract they payoff matrix from 5. The interaction still lasts for 200 rounds, and instead of 5 repetitions, our computers let us do 50 with each other strategy.

I hope that in future version, Knight will consider adding replicator dynamics from tournament to tournament and the possibility for errors in perception or action that have allowed many of the interesting strategic variants in the last 35 years.

If you want to submit a strategy then you can with a pull request. For a more detailed overview, see Vincent Knight’s video guide:

Of course, just because TIT-for-TAT was the winner in Axelrod’s first two tournaments, doesn’t mean that it will win in this one. The (preliminary) results so far from Knight’s tournament show that the ecology of strategies is completely different. The results are below, remember that a lower average payoff per round is better than higher. The position of TIT-for-TAT is highlighted for a green vertical line and the position of RANDOM by a purple vertical line. The green horizontal line highlights the performance of TIT-for-TAT from Axelrod’s original tournament translated to the new units (subtract from 1000 and divide by 200), a similar purple horizontal line for RANDOM would have been at 3.859, much worse than any strategy in Knight’s tournament. Since RANDOM does not adapt to its environment, it is a good proxy for the cooperativeness of the ecology, a decrease from 3.859 to 2.832 in Knight’s minimization version (or, a corresponding increase from 228.2 to 433.6 in Axelrod’s maximization units) means that the amount of cooperation in this ecology is much higher than for Axelrod’s tournament.

Knight_results

So winning this tournament might not be as simple as submitting TIT-for-TAT again. The state of the art in recent tournaments has been to submit several strategies instead of one. The strategies then engage in a short identification dance in the beginning to see if they are interacting with each other or with a foreign strategy. If they identify each other then one is predetermined to be the winner and defect while the partner cooperates to extract the best results. If the strategies identify a foreign opponent then the predetermined leader tries to continue with a reasonable strategy (say TIT-for-TAT) while the predetermined feeders defect against the foreign strategies to lower the overall levels of cooperation in the ecology. This results in dismal performances for the feeder strategies, but if there are enough of them then it can be amazing payoff for the leader strategy. Of course, the trick is to come up with the shortest and hardest to fool initial dance.

What strategy will you submit, dear reader? More generally, what is a good way to measure the cooperative potential of an ecology of iterated PD strategies? Can we estimate it without running the tournament?

References

Axelrod, R. (1980a). Effective choice in the prisoner’s dilemma. Journal of Conflict Resolution, 24(1): 3-25.

Axelrod, R. (1980b). More effective choice in the prisoner’s dilemma. Journal of Conflict Resolution, 24 (3), 379-403 DOI: 10.1177/002200278002400301

Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science, 211(4489), 1390-1396.

Hamilton, W.D. (1964). The genetical evolution of social behaviour 1. J. Theoret. Biol., 7: 1-16.

Oskamp, S. (1971). Effects of programmed strategies on cooperation in the prisoner’s dilemma and other mixed-motive games. Journal of Conflict Resolution, 225-259.

von Neumann, J, (1928). Zur theorie der gesellschaftsspiele. Mathematische Annalen, 100(1): 295-320.

von Neumann, J., & Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press.

Advertisements

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

10 Responses to Short history of iterated prisoner’s dilemma tournaments

  1. Pingback: Short history of iterated prisoner’s dile...

  2. Abel Molina says:

    The possibility of submitting multiple “prisoners” seems to change the payoff of the situation, since it only matters how the best one does (or maybe I misunderstand). It seems like it would be closer to the spirit of the problem to judge such entries by their average or median payoff instead.

    • Definitely, it would be better to judge it that way, but this was a hack that people noticed in the last competition using the old scheme (for your scheme, I could just submit under different names, eh ;)). Instead of median or mean, though, I would use replicator dynamics as the judge; especially if one wants to tell an evolutionary story. That way wrecking and otherwise manipulating the ecology can be a viable strategy as long as you can do it dramatically enough to get your ‘head’ strategy out ahead before the ‘support’ strategies go extinct.

      • Abel Molina says:

        Yeah, the whole fact that a person can write multiple algorithms changes the payoff of the meta-game to a different one, and only the honor system of writing a single algorithm (or something functionally equivalent) stops this. The replicator dynamics definitely allow for an easier interpretation of the results in terms of benefits for survival of cooperation/defection – only this time applied to the survival of groups rather than individuals.

  3. Pingback: A week of links | EVOLVING ECONOMICS

  4. rose says:

    Is there anyone help me about what are main differences between iterated prisoner’s
    dilemma and finetely repeated prisoner’s dilemma?

  5. Pingback: The Most Famous Game Theory Tournament, Predicting March Madness, Explaining The Origins Of Life – Game Theory News (March 2015) | Mind Your Decisions

  6. Hey Artem, for info: we have changed the default behaviour to use conventional RPTS and players now aim to maximise their utility.

  7. Pingback: Cataloging a year of blogging | Theory, Evolution, and Games Group

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s