# Measuring games in the Petri dish

For the next couple of months, Jeffrey Peacock is visiting Moffitt. He’s a 4th year medical student at the University of Central Florida with a background in microbiology and genetic engineering of bacteria and yeast. Together with Andriy Marusyk and Jacob Scott, he will move to human cells and run some in vitro experiments with non-small cell lung cancer — you can read more about this on Connecting the Dots. Robert Vander Velde is also in the process of designing some experiments of his own. Both Jeff and Robert are interested in evolutionary game theory, so this is great opportunity for me to put my ideas on operationalization of replicator dynamics into practice.

In this post, I want to outline the basic process for measuring a game from in vitro experiments. Games in the Petri-dish. It won’t be as action packed as Agar.io — that’s an actual MMO cells-in-Petri-dish game; play here — but hopefully it will be more grounded in reality. I will introduce the gain function, show how to measure it, and stress the importance of quantifying the error on this measurement. Since this is part of the theoretical preliminaries for my collaborations, we don’t have our own data to share yet, so I will provide an illustrative cartoon with data from Archetti et al. (2015). Finally, I will show what sort of data would rule-out the theoretician’s favourite matrix games and discuss the ego-centric representation of two-strategy matrix games. The hope is that we can use this work to go from heuristic guesses at what sort of games microbes or cancer cells might play to actually measuring those games.

### Measuring the gain function

Suppose you found your two competitive cell lines.[1] What’s next? Here starts my fun and a game theory detour. Suppose the proportion of cell line A in a mixture is p, with cell line B making up 1 – p, and the fitnesses of the cells are $w_A(p)$ and $w_B(p)$, respectively. Then the replicator dynamics are given by:

$\frac{dp}{dt} = p(1 - p)\underbrace{(w_A - w_B)}_\text{gain function}$

Our goal in identifying the game is to measure this gain function. For this we will use a simple calculus trick: consider the log-odds s = ln(p/(1 – p)). Then

$\frac{ds}{dt} = \frac{dp}{dt}(\frac{1}{p} + \frac{1}{1 - p}) = \frac{dp/dt}{p(1 - p)} = w_A - w_B$

By looking at the log-odds of p instead of just p, we have ‘factored out’ the logistic growth part of the equation. Now, to measure the gain function, we just have to measure the derivative of s. Unfortunately, the experimentalists do not have a derivative detector in the lab, so we have to approximate the derivative by looking at the change in s over a short period of time.[2]

In the case of the basic replating experiments — like I described in my first and second post on operationalization — you have a natural discretization of time: $p_\text{in}$ can be the proportion of type-A cells at the start of your experiment, and $p_\text{out}$ can be the proportion of type-A cells when you replate. You can then run the experiments for several initial values $p_\text{in}$ and plot your results as $\Delta s = \ln\frac{p_{out}}{1 - p_{out}} - \ln\frac{p_{in}}{1 - p_{in}}$ versus $p_\text{in}$. This graph is your gain function.

### Tracking measurement error

Since we are considering experimental data, it is important to look at the errors associated with our measurements. I don’t mean the variance between different runs in different Petri dishes, although that is also important, but the accuracy of the proportions of our initial seeds and the precision of our measurements. For example, if you measure with a ruler that has millimeter markings, you can’t say that you’ve measured the length x to better than $x \pm 0.5 \;\text{mm}$. The case is similar for these studies. Each experimental set up will serve as a different meter stick, and we will need to do the metrology for each. But experimentalists have an intuition for the order of magnitude of imprecision to expect. In the case of measuring proportions in experiments like Archetti et al. (2015) or the ones we are building, it is not unreasonable to expect a precision of $p \pm 0.015$.[3]

The reason I want to discuss error explicitly — apart from my physics conditioned response that it should be mandatory to present precision estimates alongside all empirical measurements — is that $\Delta s$ amplifies the error in $p_\text{in}, p_\text{out}$ and does so in a nonlinear fashion. There are several ways we could propagate the errors from p to s, but for these estimates, I will stick to the back of the envelope:

\begin{aligned} \sigma_{\Delta s} & = \sqrt{(\frac{\partial}{\partial p_\text{in}}\Delta s)^2 \sigma_{p_\text{in}}^2 + (\frac{\partial}{\partial p_\text{out}} \Delta s)^2 \sigma_{p_\text{out}}^2} \\ & = \sigma_p \sqrt{\frac{1}{p_\text{in}^2(1 - p_\text{in})^2} + \frac{1}{p_\text{out}^2(1 - p_\text{out})^2}} \end{aligned}

So the error is amplified by $2\sqrt{2}$ near $p_\text{in} = p_\text{out} = 0.5$ and the amplification increases as the proportions approach 0 or 1. This can lead to very big uncertainties in the gain function. Consider, for example, the most convincing case for nonlinearity in Archetti et al. (2015):[4]

Based on the data for 5% serum in figure 3C of Archetti et al. (2015). Plotting $\Delta s$ versus $p_\text{in}$, to provide an empirical estimate of the gain function for producing IGF-II. With p being the proportion of Beta-tumor cells derived from insulinomas of Rip1Tag2 mice, and 1 – p being the proportion of cells from the same mice but with a homozygous deletion of the IGF-II gene; $p_\text{out}$ is after 5 days of coculture. The error bars for the leftmost data point extend off the plot range on both sides.

We can see that the error bars become more and more exaggerated as we approach 0 or 1 for $p_\text{in}$. This continues to the point that the leftmost (and maybe also the rightmost) data-point are completely uninformative. However, this is sufficient for Archetti et al.’s (2015) main point. They just have to rule out a constant flat line as their gain function, and for me — especially given the hand-wavy approximations I used — this graph accomplishes this. Although I would be happy to fit a quadratic gain function here, but we can leave that exploration for a later post.

### Ego-centric representation of matrix games

In general, if (and only if) the $\Delta s$ versus $p_\text{in}$ data can be fit well by some line $\Delta s = a p_{in} + b$ then the interaction between your two cells can be modeled by the inviscid replicator dynamics of a matrix (i.e. two-player linear) game (or quadratic public good).[5] The gain function is sufficient to recreate the dynamics but it is insufficient to pick out the full game matrix uniquely. The gain function only has enough information to find the matrix up to arbitrary independent additive factors for each column — the ego-centric representation of the game.

In an old post, I showed the U-V parameterization of 2-strategy matrix games. Here, I would like to use another parameterization that is more in line with measuring the gain function. Like the gain function, this representation focuses on the choice between strategies that an individual agent has to make — it is an ego-centric representation — and thus does not preserve — or make easy to see — bilateral properties of the interaction pair like Pareto efficiency.

To build this representation, consider an arbitrary 2-strategy matrix game with independent arbitrary elements subtracted from each column:

$\begin{pmatrix} R - c_1 & S - c_2 \\ T - c_1 & P - c_2 \end{pmatrix}$

Notice that the fitnesses for the two strategies in this game are given by:

\begin{aligned} w_A & = p(R - c_1) + (1 - p)(S - c_2) \\ w_B & = p(T - c_1) + (1 - p)(P - c_2) \end{aligned}

The gain function is then

$w_A - w_B = p(R - T) + (1 - p)(S - P)$

and independent of $c_1, c_2$. Thus we can choose these two values arbitrarily, without changing the game dynamics — kind of how we subtracted a common offset in the U-V parameterization. Let us select $c_1 = T$ and $c_2 = S$, to give us a dynamically-equivalent game:

$\begin{pmatrix} a + b & 0 \\ 0 & b \end{pmatrix}$

with $a := (R - T) - (P - S)$ and $b := P - S$.[6] Note that these are the same a, b as from our linear fit. They split our game-space into four distinct regions:[7]

• If b > 0 and a + b > 0 then the population always converges towards ALL-A.
• If b < 0 and a + b < 0 then the population always converges towards ALL-B.
• If b > 0 and a + b < 0 then the population always converges towards the internal fixed point at $p^* = - \frac{b}{a}$
• If b < 0 and a + b > 0 then the population bifurcates around the internal fixed point at $p^* = - \frac{b}{a}$; i.e. if the initial proportion is higher than p* then it goes to ALL-A, if it is lower than p* then it goes to ALL-B.

These are the only dynamics possible for a matrix game. However, we can get more possible dynamics — like multiple internal fixed points — if the empirical gain function is not well represented by the line $\Delta s = a p_{in} + b$. Unfortunately, the evolution of cooperation and mathematical oncology literature seldom considers non-matrix games. Thus, if we consistently find that the Petri dish is not well described by a linear gain function then we will need to adjust what sort of heuristic models we build, or explain how stochasticity or space (or some other factor) create nonlinear gain functions from matrix games.

### Notes and References

1. Finding such cell lines is a far from trivial task, and requires great experimental expertise. Originally I included a section of the difficulties involved in this, but to amortize your reading this has been moved to a later post dedicated to the difficulties of theorists like me understanding experiments.
2. Here we run into the important question of “how short is short enough”? If we run the experiment for too short of a time then the change in p will be overwhelmed by the measurement noise, but if we run for too long before measuring $p_\text{out}$ then it doesn’t make sense to say we are measuring the derivative. To overcome this, we have to focus on the tangent line of our function as a local linearization. If we have access to high-resolution in time data of the growth during our preliminary experiments then we can plot the resulting growth curve as we would normally: proportion versus time. We then want to take the biggest time-window on which this growth curve is well-approximated by a constant-fitness logistic growth. This logistic growth is the linearization of our function, and as long as it is in approximate agreement, we can say that we are measuring the derivative.
3. Note that I am violating my own rules on abusing numbers here. Since p is logically constrained to be between 0 and 1, it doesn’t make sense to have a proportion like $0.99 \pm 0.015$. In this post I won’t look at proportions below 0.015 or above 0.985 and just want a quick back of the envelope calculation. A more detailed quantification of uncertainty would require me to delve further into the details of experimental set up, and I want to save that for another time. So, dear reader, I hope you will forgive me.
4. Note that this is meant as a rough illustrative example. A cartoon of sorts. The numbers I am taking as single measurements of $p_\text{out}$ are actually a mean from several independent runs for a given $p_\text{in}$. Further, the assumption of 0.015 precision is based on my own guess — after discussing with experimentalists — and not taken directly from Archetti et al.’s (2015) work. In fact, it seems to me that they have not quantified their single measurement error, although they did a careful job of presenting the statistics of error from multiple runs. Of course, presenting error from multiple runs can be just as good of a method of capturing measurement error. Consider my treatment to be a very crude simplification. If there is interest in a more careful reanalysis of Archetti et al. (2015) then let me know and I will contact the authors for their raw numbers.
5. This is a phenomenological statement, about how good replicator dynamics fit a given dataset. It could be that you have some other game dynamics, for example one that explicitly models space or stochasticity, that has pair-wise linear interactions for its micro-dynamics but that does not lead to a matrix game at the level of just population proportions. I am not aiming to rule out these cases, since I want to remain as agnostic as I can to the reductionist story involved. In other words, I am not aiming to answer: is this really a game? Instead, I aim to answer: is it useful to model it as a game at the macroscopic level of description?

Note that it is necessary to add effects like space to make a matrix game at the individual level result in a nonlinear gain function at the population level, but it is not sufficient. For example, the Ohtsuki & Nowak (2006) transform preserves linearity of the gain function — although changing the a, b parameters of a fit — while introducing an approximate account of spacial structure.

6. In fact, it is possible to simplify this to a piecewise one-parameter characterization. If a > 0 then we can divide by a without changing the game dynamics. This would be equivalent to picking different units of time or units of fitness. If we let $p^* := -\frac{b}{a}$ then the matrix simplifies to the single parameter:

$\begin{pmatrix} 1 - p^* & 0 \\ 0 & -p^* \end{pmatrix}_+$

alternatively, if a < 0 then we can divide by -a, letting us simplify to the matrix:

$\begin{pmatrix} -(1 - p^*) & 0 \\ 0 & p^* \end{pmatrix}_-$

together, these give us our four dynamic regimes: (1) ALL-A if a > 0 and $p^* \leq 0$ or a < 0 and $p^* \geq 1$; (2) ALL-B if a > 0 and $p^* \geq 1$ or a < 0 and $p^* \leq 0$; and if 0 < p* < 1 then (3) internal attractor at p* when a < 0, and (4) bifurcation around internal repulsor at p* if a > 0.

7. There is a boring 5th region that I do not discuss explicitly: neutral drift between the strategies when a = b = 0.

Archetti, M., Ferraro, D.A., & Christofori, G. (2015). Heterogeneity for IGF-II production maintained by public goods dynamics in neuroendocrine pancreatic cancer. Proceedings of the National Academy of Sciences of the United States of America, 112 (6), 1833-8 PMID: 25624490

Ohtsuki H, & Nowak MA (2006). The replicator equation on graphs. Journal of Theoretical Biology, 243(1): 86-97. PMID: 16860343

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

### 15 Responses to Measuring games in the Petri dish

1. Arne says:

We have attempted something similar in spirit with bacteria.
Looking at frequencies alone, it looked like coordination. But
taking into account growth led to different conclusions,
http://rsif.royalsocietypublishing.org/content/12/108/20150121
In that case, things were more subtle than anticipated…
Arne

• Welcome to the comments of TheEGG, Arne! Good to have you here.

I am ashamed to say that I had not read that paper before, so I ran a reading group on it this past Friday. It was a really cool experiment, but I don’t agree with your interpretation that the model leads to a different conclusion than the coordination game. I discuss your paper at length in my most recent post. I’d appreciate continuing our discussion in the comments there. Especially if I misunderstood the paper.

To give an executive summary: when $K_C = K_D$, the model in eq. (3.1) is equivalent to

\begin{aligned} \frac{dx}{dt} & = x(1 - x)(r_C - r_D) \\ \frac{dN}{dt} & = N(K - N)\frac{(xr_C + (1 - x)r_D)}{K} \end{aligned}

where the first equation is the replicator dynamics of C vs. D, and completely independent of density (the second equation). In the more general case of $K_C \neq K_D$, the units in eq. (3.1) don’t seem to check out, as I discuss in footnote 6 of the above-linked post.

2. Marc says:

Once upon a time I investigated using Bayesian inference to infer relative fitness parameters in the Moran process: http://arxiv.org/abs/1303.4566 . One could use the basic idea to infer any game matrix or parameterized fitness landscape.

This site uses Akismet to reduce spam. Learn how your comment data is processed.