## Replicator dynamics and cognitive cost of agency

During the June 3rd Friday meeting of the LNSC, Akash Venkat asked a fun question of how simple it is to remember your strategy. I interpreted this as a question about how difficult it is to have a deterministic or low-entropy strategy. In other words, a question about the fitness of deterministic vs. randomized strategies. Last week I expressed my skepticism towards randomized strategies for ethnocentrism, this post is meant as a potential counter-balance for ideas we can use to look at randomized strategies.

After the LNSC meeting as I was on the train back to Waterloo, I did some calculations to see how to start thinking about this question. In particular, how to properly tax the cognition associated with behaving non-randomly.

The ability to perform actions (as opposed to randomly floating around in an environment) can be see as the key quality of agency. In an EGT setting, we can capture this idea by saying the strategies that are close to random have low agency, and strategies that are close to deterministic have high agency. Having high energy requires more sophisticated cognition on the part of the agent, and thus we can associate a cost $-k$ with having high agency. Equivalently we can think of giving a bonus of $k$ to low agency players. This allows us to start looking at a toy model right away. Let us consider a general cooperator-defector game: $\begin{pmatrix} 1 & U \\ V & 0 \end{pmatrix}$

For this game we have the standard deterministic strategies C and D, and we will now add the random strategy R which with a 50% probability plays C and otherwise plays D. Thus R captures the idea of the least possible agency, and we give it a bonus $k$ to fitness. Letting $p$ be the proportion of C agents, and $q$ the proportion of R agents, we can write down the utilities for all three strategies: \begin{aligned} U(C) & = (p + q/2) + (1 - p - q/2)U \\ &= (p + q/2)(1 - U) + U \\ U(D) &= (p + q/2)V \\ U(R) &= \frac{U(C) + U(D)}{2} + k \\ &= (p + \frac{q}{2})(\frac{1 - U + V}{2}) + \frac{U}{2} + k \end{aligned}

From these equations, we can see that $U(R) > U(C),U(D)$ if $k > k^*$ where: \begin{aligned} k^* & = \frac{|U(C) - U(D)|}{2} \\ & = |(p + \frac{q}{2})(\frac{1 - (U + V)}{2}) + \frac{U}{2}| \end{aligned}

Thus, an agent is not willing to pay more than $k^*$ in order to have agency, because whenever they pay more than $k^*$ they will be dominated by the non-agent R strategy. Note that $k^*$ is a function of $p$ and $q$, thus it corresponds to the most agents are willing to pay in a given population distribution. To eliminate the dependence on $p$ and $q$ we can consider a population with $q$ arbitrarily close to 1. This will give us a maximum value of $k$ for which
R is not ESS: $k^{\text{ESS}} = \frac{|1 + U - V|}{4}$

In other words, if we are really close to the line in game space given by $V = 1 + U$ then agency will evolve only for very small values of $k$. Thus, we can think of this as the no agency line.

Although at first sight the non-agent strategy being 50%-50% seems natural, it is in fact arbitrary. If we are taking the viewpoint that the non-agent players are simply allowing the environment to pick the action for them, then there is no reason to assume that the environment is unbiased between C and D. It is very reasonable to believe that action C might be more difficult than action D (or vice-versa) and the environment will have a non 50%-50% chance of selection one or the other for the player.

We could introduce a new parameter $r$ for the environments probability to pick C (so in the previous discussion we had $r = 0.5$) and parametrize the environment by 3 variables, $U$, $V$, and $r$. It is obvious how this would modify the previous equations into: \begin{aligned} U(R) & = (rU(C) + (1 - r)U(D)) + k \\ & = (p + \frac{q}{2})(r(1 - U) + (1 - r)V) + rU \\ k^* & = \max((1 - r)(U(C) - U(D)), r(U(D) - U(C))) \\ k^{\text{ESS}} & = \frac{\max((1 - r)(1 + U - V),r(V - U - 1))}{2} \end{aligned}

Note that this equation is not fundamentally different from the previous ones, and just provides a linear offset to the cognitive cost. This tells us that the natural measure of complexity should be linear in the agent’s degree of randomness. In other words, for a strategy space where an agent can evolve any
randomness factor $s$ and the environment has a preferred randomness $r$ the
natural choice for cognitive complexity $k(s)$ is: $k(s) = k \max(\frac{s - r}{1 - r},\frac{r - s}{r})$

Do you think this is a viable model for the cognitive cost of agency? Or should we capture agency in some other way? What do you expect will happen when we take this game over to a structured environment? I expect that unlike the well mixed case, in certain circumstances, the agents will be willing to pay for access to randomness (instead of access to determinism).