Hunger Games themed semi-iterated prisoner’s dilemma tournament
July 28, 2013 10 Comments
With all the talk surrounding it, crowdsourcing science might seem like a new concept and it might be true for citizen science efforts, but it is definitely an old trick to source your research to other researchers. In fact, evolutionary game theory was born (or at least popularized) by one such crowdsourcing exercise; in 1980, Robert Axelrod wanted to find out the best strategy for iterated prisoner’s dilemma and reached out to prominent researchers for strategy submissions to a round-robin tournmanet. Tit-for-tat was the winning strategy, but the real victor was Axelrod. His 1981 paper with Hamilton analyzing the result went on to become a standard reference in applications of game theory to social questions (at least outside of economics), agent-based modeling, and — of course — evolutionary game theory. Of Axelrod’s sizeable 47,222 (at time of writing) citations, almost half (23,370) come from this single paper. The tradition of tournaments continues among researchers, I’ve even discussed an imitation tournament on imitation previously.
The cynical moral of the tale: if you want to be noticed then run a game theory tournament. The folks at Brilliant.org — a website offering weekly olympiad-style challange problems in math and physics — took this message to heart, coupled it to the tried-and-true marketing technique of linking to a popular movie/book franchise, and decided to run a Hunger Games themed semi-iterated Prisoner’s dillema tournament. Submit a quick explanation of your strategy and Python script to play the game, and you could be one of the 5 winners of the $1,000 grand prize. Hooray! The submission deadline is August 18th, 2013 and all you need is a Brilliant account and it seems that these are free. If you are a reader of TheEGG blog then I recommend submitting a strategy, and discussing it in the comments (either before or after the deadline); I am interested to see what you come up with.
I will present the rules in more standard game-theoretic terms, without the Hunger Games garb. The tournament is between an unknown number of players and is run for an unknown number of cycles (i.e. the game runs for at least cycles, and then has a probability of ending on any cycle after that; both parameters are unknown). Each player starts with points, and if at the end of a cycle you have 0 or fewer points, you are eliminated from the game; let be the number of players alive on cycle . The game ends when either one player remains or rounds expire; the player with the most points at the end of the game wins (the other 4 grand prize winners are selected by committee) and each surviving player gets an “I survived” t-shirt.
From your perspective, at each cycle you simultaneously play an independent prisoner’s dilemma game against each of the other players for a total of games. Each game has the payoff matrix:
Note that the game is additive, there is no synergetic effect from cooperating with a cooperator. No matter what you do, if your opponent cooperated then you get 3 more points than if the defect.
At the start of each cycle, a random public number is drawn (presumably uniformly at random). This number sets the threshold for a public goods game: if the total number of cooperative actions (over all players) is or greater then each player (even ones that never cooperated) receives an addition points. Since your strategy just has to survive (and there is no reproduction), it is obvious that defection is optimal.
To make the game interesting, the developers introduce a reputation effect. At the start of each cycle, each living player has a reputation score equal to the proportion of times she cooperated in the past; on the first cycle (before any interactions are had) this number is set to zero for all players. At the start of each cycle, you are given the round number , the amount of points you have , the randomly drawn public-good threshold , and an array of values corresponding to the reputations of the remaining players presented in a randomized order. You have to output an array of values corresponding to if you will cooperate or defect from the player that has reputation .
It is not clear if you are allowed to have any extra internal memory that carries over from round-to-round, but I will assume no. Since the other players have no indexing number (their order in the array is randomized every cycle), you have only their reputation to base strategy and hence the game (like the one neural nets played in Keven’s first post) is not truly iterated. However, if we assume that you survive into the ‘late’ stage of the game and most players are dead and the remaining players had randomized strategies (i.e. you don’t assume how they cooperated or defected, just that their total number of cooperations is not a function of their history, but of history and a random seed) then with high probability each of the remaining players will have a distinct reputation and if you are allowed memory then you can track them from then on by reputation. However, you still won’t know how they acted toward you in the last round, only a summary statistic across all players.
I think I will write a player for this game, and report my analysis and detailed strategy in a future post. For now though, I will leave you with some points I think are noteworthy:
- It is important to estimate before coding what value and the developers are likely to use; is much easier to estimate tightly than .
- As you play the game, it is important to refine your estimates dynamically since an optimal strategy will depend on knowing these parameters.
- The other important factor to learn the conversion factor between reputation (i.e. how much is a cooperation worth in terms of likelihood of the population cooperating with you in return) and points. Of course, this is extremely difficult to estimate well since it depends on the strategy of every other player. The only objective thing you can say is that since reputation is a cumilative statistic, it will become less valuable as time goes on (i.e. cooperation will become less useful in the long-term).
- Since the game is additive, it doesn’t matter if you cooperate with cooperators or defectors, and for reputation affects only the proportion of your cooperative actions matters, not with whom they were. As such, choosing if to dish out your cooperation to low or high reputation players is only useful as a way of manipulating the reputation-to-points conversion factor.
Once you have good estimates for the above parameters, you should act rationally to maximize expected profit given the extinction constraint.
Axelrod, R., & Hamilton, W.D. (1981). The evolution of cooperation Science, 211 (4489), 1390-1396 DOI: 10.1126/science.7466396