Hunger Games themed semi-iterated prisoner’s dilemma tournament

hungerGamesCodeWith all the talk surrounding it, crowdsourcing science might seem like a new concept and it might be true for citizen science efforts, but it is definitely an old trick to source your research to other researchers. In fact, evolutionary game theory was born (or at least popularized) by one such crowdsourcing exercise; in 1980, Robert Axelrod wanted to find out the best strategy for iterated prisoner’s dilemma and reached out to prominent researchers for strategy submissions to a round-robin tournmanet. Tit-for-tat was the winning strategy, but the real victor was Axelrod. His 1981 paper with Hamilton analyzing the result went on to become a standard reference in applications of game theory to social questions (at least outside of economics), agent-based modeling, and — of course — evolutionary game theory. Of Axelrod’s sizeable 47,222 (at time of writing) citations, almost half (23,370) come from this single paper. The tradition of tournaments continues among researchers, I’ve even discussed an imitation tournament on imitation previously.

The cynical moral of the tale: if you want to be noticed then run a game theory tournament. The folks at Brilliant.org — a website offering weekly olympiad-style challange problems in math and physics — took this message to heart, coupled it to the tried-and-true marketing technique of linking to a popular movie/book franchise, and decided to run a Hunger Games themed semi-iterated Prisoner’s dillema tournament. Submit a quick explanation of your strategy and Python script to play the game, and you could be one of the 5 winners of the $1,000 grand prize. Hooray! The submission deadline is August 18th, 2013 and all you need is a Brilliant account and it seems that these are free. If you are a reader of TheEGG blog then I recommend submitting a strategy, and discussing it in the comments (either before or after the deadline); I am interested to see what you come up with.

I will present the rules in more standard game-theoretic terms, without the Hunger Games garb. The tournament is between an unknown number N_1 of players and is run for an unknown number T = t + \mathrm{Geom}(r) of cycles (i.e. the game runs for at least t cycles, and then has a probability r of ending on any cycle after that; both parameters are unknown). Each player starts with 300(N_1 - 1) points, and if at the end of a cycle you have 0 or fewer points, you are eliminated from the game; let N_i be the number of players alive on cycle i. The game ends when either one player remains or T rounds expire; the player with the most points at the end of the game wins (the other 4 grand prize winners are selected by committee) and each surviving player gets an “I survived” t-shirt.

From your perspective, at each cycle you simultaneously play an independent prisoner’s dilemma game against each of the other players for a total of N_i - 1 games. Each game has the payoff matrix:

\begin{pmatrix}  0 & -3 \\ 1 & -2  \end{pmatrix}

Note that the game is additive, there is no synergetic effect from cooperating with a cooperator. No matter what you do, if your opponent cooperated then you get 3 more points than if the defect.

At the start of each cycle, a random public number 0 \leq m \leq N_i(N_i - 1) is drawn (presumably uniformly at random). This number sets the threshold for a public goods game: if the total number of cooperative actions (over all players) is m or greater then each player (even ones that never cooperated) receives an addition 2(N_i - 1) points. Since your strategy just has to survive (and there is no reproduction), it is obvious that defection is optimal.

To make the game interesting, the developers introduce a reputation effect. At the start of each cycle, each living player j has a reputation score R_{ij} equal to the proportion of times she cooperated in the past; on the first cycle (before any interactions are had) this number is set to zero for all players. At the start of each cycle, you are given the round number i, the amount of points you have L_i, the randomly drawn public-good threshold m_i, and an array of N_i - 1 values R'_i[j] corresponding to the reputations of the remaining players presented in a randomized order. You have to output an array A_i[j] of N_i - 1 values corresponding to if you will cooperate or defect from the player that has reputation R'_i[j].

It is not clear if you are allowed to have any extra internal memory that carries over from round-to-round, but I will assume no. Since the other players have no indexing number (their order in the array is randomized every cycle), you have only their reputation to base strategy and hence the game (like the one neural nets played in Keven’s first post) is not truly iterated. However, if we assume that you survive into the ‘late’ stage of the game and most players are dead and the remaining players had randomized strategies (i.e. you don’t assume how they cooperated or defected, just that their total number of cooperations is not a function of their history, but of history and a random seed) then with high probability each of the remaining players will have a distinct reputation and if you are allowed memory then you can track them from then on by reputation. However, you still won’t know how they acted toward you in the last round, only a summary statistic across all players.

I think I will write a player for this game, and report my analysis and detailed strategy in a future post. For now though, I will leave you with some points I think are noteworthy:

  • It is important to estimate before coding what value t and p the developers are likely to use; t is much easier to estimate tightly than p.
  • As you play the game, it is important to refine your estimates dynamically since an optimal strategy will depend on knowing these parameters.
  • The other important factor to learn the conversion factor between reputation (i.e. how much is a cooperation worth in terms of likelihood of the population cooperating with you in return) and points. Of course, this is extremely difficult to estimate well since it depends on the strategy of every other player. The only objective thing you can say is that since reputation is a cumilative statistic, it will become less valuable as time goes on (i.e. cooperation will become less useful in the long-term).
  • Since the game is additive, it doesn’t matter if you cooperate with cooperators or defectors, and for reputation affects only the proportion of your cooperative actions matters, not with whom they were. As such, choosing if to dish out your cooperation to low or high reputation players is only useful as a way of manipulating the reputation-to-points conversion factor.

Once you have good estimates for the above parameters, you should act rationally to maximize expected profit given the extinction constraint.

ResearchBlogging.orgAxelrod, R., & Hamilton, W.D. (1981). The evolution of cooperation Science, 211 (4489), 1390-1396 DOI: 10.1126/science.7466396

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

12 Responses to Hunger Games themed semi-iterated prisoner’s dilemma tournament

  1. Correction: state can be preserved in the object oriented approach, see last example.

  2. Tom says:

    The sample code given on the site seems to imply a freedom to store state in your module, unless I’m reading it wrong. Just FYI :D

    • Tom says:

      You appear to have corrected yourself in a comment about 10 hours before I did…wonder why that comment didn’t show up when I first loaded the page? Frown. Oh well, the correction is there either way, thanks!

  3. Daniel says:

    Reading this on mobile, half your latex isn’t rendered, so it’s rather difficult to read.

    • My bad, some of that LaTeX didn’t render because I just typed $ to start them instead of the $ followed by ‘latex’ that WordPress requires. Sorry about that, I tend to make this error when I type up a post quickly, I have fixed now. Let me know if you have any further problems.

  4. Mithrandir says:

    You can also know how each other player acted against you in the previous round. There’s a method your bot might implement which will get called at the end of the round and returns the food earned from each game in the round.

    • Ahh, thank you! I really should have looked at the sample code more closely. I will write about Tit-for-Tat variants in an upcoming post to clear up all the confusion I caused.

      I suspect most bots will be sorting the array they receive by reputation to implement approximate player tracking. Kind of awkward that the tournament organizers just don’t give you the array sorted (with ties randomized each cycle). Would save everybody the N_i \log N_i overhead of sorting each cycle.

      • Mithrandir says:

        Well, by giving it sorted you could have some glimpses on the actual player identities. At least in the end rounds.

        Also, they said in the FAQ that the running time doesn’t matter as long as the algorithm is sound. So an O(N^5) would get it’s time to run if it is properly argumented in the paper submitted with the code.

        • Giving it sorted wouldn’t give you any extra information (as long as ties are randomized from cycle to cycle and maybe player to player, but not as essential) because you can sort it yourself to have the same information (hence, nothing new is learned, it is already in the completely randomized input, you just have to spend the time to extract it).

          Yeah, I wasn’t concerned about the time from my point of view, but from the point of view of the organizers. Since most people will be sorting, why not just give it sorted and save your servers the effort of everybody running their own sorting algorithm. In the case of the organizers, the sorting would even be slightly easier (although asymptotically still N \log N, I think) since they can do in-place updating after each cycle instead of a blind resort each player would have to do. But I guess they are an educational website, so maybe it is pedagogical to teach people how to do alignments?

  5. Pingback: Stats 101: an update on readership | Theory, Evolution, and Games Group

  6. Pingback: Cataloging a year of blogging: from behavior to society and mind | Theory, Evolution, and Games Group

  7. Pingback: Why academics should blog and an update on readership | Theory, Evolution, and Games Group

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: