Interface theory of perception can overcome the rationality fetish

I might be preaching to the choir, but I think the web is transformative for science. In particular, I think blogging is a great form or pre-pre-publication (and what I use this blog for), and Q&A sites like MathOverflow and the cstheory StackExchange are an awesome alternative architecture for scientific dialogue and knowledge sharing. This is why I am heavily involved with these media, and why a couple of weeks ago, I nominated myself to be a cstheory moderator. Earlier today, the election ended and Lev Reyzin and I were announced as the two new moderators alongside Suresh Venkatasubramanian, who is staying on to for continuity and to teach us the ropes. I am extremely excited to work alongside Suresh and Lev, and to do my part to continue devoloping the great community that we nurtured over the last three and a half years.

cubeHowever, I do expect to face some challenges. The only critique raised against our outgoing moderators, was that an argumentative attitude that is acceptable for a normal user can be unfitting for a mod. I definitely have an argumentative attitude, and so I will have to be extra careful to be on my best behavior.

Thankfully, being a moderator on cstheory does not change my status elsewhere on the website, so I can continue to be a normal argumentative member of the Cognitive Sciences StackExchange. That site is already home to one of my most heated debates against the rationality fetish. In particular, I was arguing against the statement that “a perfect Bayesian reasoner [is] a fixed point of Darwinian evolution”. This statement can be decomposed into two key assumptions: a (1) perfect Bayesian reasoner makes the most veridical decisions given its knowledge, and (2) veridicity has greater utility for an agent and will be selected for by natural selection. If we accept both premises then a perfect Bayesian reasoner is a fitness-peak. Of course, as we learned before: even if something is a fitness-peak doesn’t mean we can ever find it.

We can also challenge both of the assumptions (Feldman, 2013); the first on philosophical grounds, and the second on scientific. I want to concentrate on debunking the second assumption because it relates closely to our exploration of objective versus subjective rationality. To make the discussion more precise, I’ll approach the question from the point of view of perception — a perspective I discovered thanks to TheEGG blog; in particular, the comments of recent reader Zach M.
Read more of this post

Baldwin effect and overcoming the rationality fetish

G.G. Simpson and J.M. Baldwin

G.G. Simpson and J.M. Baldwin

As I’ve mentioned previously, one of the amazing features of the internet is that you can take almost any idea and find a community obsessed with it. Thus, it isn’t surprising that there is a prominent subculture that fetishizes rationality and Bayesian learning. They tend to accumulate around forums with promising titles like OvercomingBias and Less Wrong. Since these communities like to stay abreast with science, they often offer evolutionary justifications for why humans might be Bayesian learners and claim a “perfect Bayesian reasoner as a fixed point of Darwinian evolution”. This lets them side-stepped observed non-Bayesian behavior in humans, by saying that we are evolving towards, but haven’t yet reached this (potentially unreachable, but approximable) fixed point. Unfortunately, even the fixed-point argument is naive of critiques like the Simpson-Baldwin effect.

Introduced in 1896 by psychologist J.M. Baldwin then named and reconciled with the modern synthesis by leading paleontologist G.G. Simpson (1953), the Simpson-Baldwin effect posits that “[c]haracters individually acquired by members of a group of organisms may eventually, under the influence of selection, be reenforced or replaced by similar hereditary characters” (Simpson, 1953). More explicitly, it consists of a three step process (some of which can occur in parallel or partially so):

  1. Organisms adapt to the environment individually.
  2. Genetic factors produce hereditary characteristics similar to the ones made available by individual adaptation.
  3. These hereditary traits are favoured by natural selection and spread in the population.

The overall result is that originally individual non-hereditary adaptation become hereditary. For Baldwin (1886,1902) and other early proponents (Morgan 1886; Osborn 1886, 1887) this was a way to reconcile Darwinian and strong Lamarkian evolution. With the latter model of evolution exorcised from the modern synthesis, Simpson’s restatement became a paradox: why do we observe the costly mechanism and associated errors of individual learning, if learning does not enhance individual fitness at equilibrium and will be replaced by simpler non-adaptive strategies? This encompass more specific cases like Rogers’ paradox (Boyd & Richerson, 1985; Rogers, 1988) of social learning.
Read more of this post

Replicator dynamics of cooperation and deception

In my last post, I mentioned how conditional behavior usually implied a transfer of information from one agent to another, and that conditional cooperation was therefore vulnerable to exploitation through misrepresentation (deception). Little did I know that an analytic treatment of that point had been published a couple of months before.

McNally & Jackson (2013), the same authors who used neural networks to study the social brain hypothesis, present a simple game theoretic model to show that the existence of cooperation creates selection for tactical deception. As other commentators have pointed out, this is a rather intuitive conclusion, but of real interest here are how this relationship is formalized and whether this model maps onto reality in any convincing way. Interestingly, the target model is reminiscent of Artem’s perception and deception models, so it’s worth bringing them up for comparison; I’ll refer to them as Model 1 and Model 2.
Read more of this post

Egalitarians’ dilemma and the cognitive cost of ethnocentrism

Ethnocentrism (or contingent altruism) can be viewed as one of many mechanisms for enabling cooperation. The agents are augmented with a hereditary tag and the strategy space is extended from just cooperation/defection to behaviour that can be contingent on if the diad share or differ in their tag. The tags and strategy are not inherently correlated, but can develop local correlations due to system dynamics. This can expand the range of environments in which cooperation can be maintained, but an assortment-biasing mechanism is needed to fuel the initial emergence of cooperation (Kaznatcheev & Shultz, 2011). The resulting cooperation is extended only towards the in-group while the out-group continues to be treated with the cold rationality of defection.

Suppose that circles are the in-group and squares the out-group. The four possible strategies and their minimal representations as finite state machines is given.

Suppose that circles are the in-group and squares the out-group. The four possible strategies and their minimal representations as finite state machines is given.

The four possible strategies are depicted above, from top to bottom: humanitarian, ethnocentric, traitorous, and selfish. Humanitarians and selfish agents do not condition their behavior on the tag of their partner, and do not require the cognitive ability to categorize. Although this ability is simple, it can still merit a rich analysis (see: Beer, 2003) by students of minimal cognition. By associating a small fitness cost k with categorization, we can study how much ethnocentric (and traitorous) agents are willing to pay for their greater cognitive abilities. This cost directly changes the default probability to reproduce (\text{ptr}), with humanitarians and selfish agents having \text{ptr} = 0.11 and ethnocentrics and traitorous agents having \text{ptr} = 0.11 - k. During each cycle, the \text{ptr} is further modified by the game interactions, with each cooperative action costing c = 0.01 and providing a benefit b (that varies depending on the simulation parameters) to the partner. For more detailed presentation of the simulation and default parameter, or just to follow along on your computer, I made my code publicly available on GitHub. Pardon its roughness, the brunt of it is legacy code from when I first build this model in 2009 for Kaznatcheev (2010).

Number of agents by strategy versus evolutionary cycle. The lines represent the number of agents of each strategy: blue --  humanitarian; green -- ethnocentric; yellow -- traitorous; red -- selfish. The width of the line corresponds to standard error from averaging 30 independent runs. The two figures correspond to different costs of cognition. The left is k = 0.002 and is typical of runs before the cognitive cost phase transition. The right is k = 0.007 and is typical of runs after the cognitive cost phase transition. Figure is adapted from Kaznatcheev (2010).

Number of agents by strategy versus evolutionary cycle. The lines represent the number of agents
of each strategy: blue — humanitarian; green — ethnocentric; yellow — traitorous; red — selfish. The width of the line corresponds to standard error from averaging 30 independent runs. The two figures correspond to different
costs of cognition. The left is k = 0.002 and is typical of runs before the cognitive cost phase transition. The right is k = 0.007 and is typical of runs after the cognitive cost phase transition. Figure is adapted from Kaznatcheev (2010).

The dynamics for low k are about the same as the standard no cognitive cost model as can be seen from the left figure above. However, as k increases there is a transition to a regime where humanitarians start to dominate the population, as in the right figure above. To study this, I ran simulations with a set b/c ratio and increasing k from 0.001 to 0.02 with steps of 0.001. You can run your own with the command bcRun(2.5,0.001*(1:20)); some results are presented below, your results might differ slightly due to the stochastic nature of the simulation.

The figure presents the proportion of humanitarians (blue), ethnocentrics (red), and cooperative interactions (black) versus cognitive cost for b/c = 2.5. The dots are averages from evolutionary cycles 9000 to 10000 of 10 independent runs. The lines are best-fit sigmoids.

Proportion of humanitarians (blue), ethnocentrics (red), and cooperative interactions (black) versus cognitive cost for b/c = 2.5. Dots are averages from evolutionary cycles 9000 to 10000 of 10 independent runs. The lines are best-fit sigmoids and the dotted lines mark the steepest point; I take take this as the point for the cognitive cost phase transition. Data generated with bcRun(2.5,0.001*(1:20)) and visualized with bcPlot(2.5,0.001*(1:20),[],1)

Each data-point is the average from the last 1000 cycles of 10 independent simulations. The points suggest a phase transition from a regime of few humanitarians (blue), many ethnocentrics (red), and very high cooperation (black) to one of many humanitarians, few ethnocentrics, and slightly less cooperation. To get a better handle on exactly where the phase transition is, I fit sigmoids to the data using fitSigmoid.m. The best-fit curves are shown as solid lines; I defined the point of phase transition as the steepest (or inflection) point on the curve and plotted them with dashed lines for reference. I am not sure if this is the best approach to quantifying the point of phase transition, since the choice of sigmoid function is arbitrary and based only on the qualitative feel of the function. It might be better to fit a simpler function like a step-function or a more complicated function from which a critical exponent can be estimated. Do you know a better way to identify the phase transition? At the very least, I have to properly measure the error on the averaged data points and propogate it through the fit to get error bounds on the sigmoid parameters and make sure that — within statistical certainty — all 3 curves have their phase transition at the same point.

The most interesting feature of the phase transition, is the effect on cooperation. The world becomes more equitable; agents that treat out-groups differently from in-group (ethnocentrics) are replaced by agents that treat everyone with equal good-will and cooperation (humanitarians). However, the overall proportion of cooperative interactions decreases — it seems that humanitarians are less effective at suppressing selfish agents. This is consistent with the free-rider suppression hypothesis that Shultz et al. (2009) believed to be implausible. The result is egalitarians’ dilemma: by promoting equality among agents the world becomes less cooperative. Should one favour equality and thus individual fairness over the good of the whole population? If we expand our moral circle to eliminate out-groups will that lead to less cooperation?

In the prisoners’ dilemma, we are inclined to favor the social good over the individual. Even though it is rational for the individual to defect (securing a higher payoff for themselves than cooperating), we believe it is better for both parties to cooperate (securing a better social payoff than mutual defection). But in the egalitarians’ dilemma we are inclined to favour the individualistic strategy (fairness for each) over the social good (higher average levels of cooperative interactions). We see a similar effect in the ultimatum game: humans reject unfair offers even though that results in neither player receiving a payoff (worse for both). In some ways, we can think of the egalitarians’ dilemma as the population analogue of the ultimatum game; should humanity favor fairness over higher total cooperation?

I hinted at some of these questions in Kaznatcheev (2010) but I restrained myself to just b/c = 2.5. From this limited data, I concluded that since the phase transition happens for k less than any other parameter in the model, it must be the case that agents are not willing to invest much resources into developing larger brains capable of categorical perception just to benefit from an ethnocentric strategy. Ethnocentrism and categorical perception would not have co-evolved, the basic cognitive abilities would have to be in place by some other means (or incredibly cheap) and then tag-based strategies could emerge.

Points of phase transition

Value of k at phase transition versus b/c ratio. In blue is the transition in proportion of humanitarians, red — proportion of ethnocentrics, and black – proportion of cooperative interactions. Each data point is made from a parameter estimate done using a sigmoid best fit to 200 independent simulations over 20 values of k at a resolution of 0.001.

Here, I explored the parameter space further, by repeating the above procedure while varying the b/c ratio by changing b from 0.02 to 0.035 in increments of 0.0025 while keeping c fixed at 0.01. Unsurprisingly, the transitions for proportion of ethnocentrics and humanitarians are indistinguishable, but without a proper analysis it is not clear if the transition from high to low cooperation also always coincides. For b/c > 2.75, agents are willing to invest more than c before the phase transition to all humanitarians, this invalidates my earlier reasoning. Agents are unwilling to invest much resources in larger brains capable of categorical perception only for competitive environments (low b/c). As b increases, the agents are willing to invest more in their perception to avoid giving this large benefit to the out-group. This seems consistent with explicit out-group hostility that Kaznatcheev (2010b) observed in the harmony game. However, apart from simply presenting the data, I can’t make much more sense from this figure. Do you have any interpretations? Can we learn something from the seemingly linear relationship? Does the slope (if we plot k versus b then it is about 0.5) tell us anything? Would you still conclude that co-evolution of tag-based cooperation and categorical perception is unlikely?


Beer, Randall D. (2003). The Dynamics of Active Categorical Perception in an Evolved Model Agent. Adaptive Behavior. 11(4): 209-243.

Kaznatcheev, Artem (2010). The cognitive cost of ethnocentrism Proceedings of the 32nd annual conference of the cognitive science society

Kaznatcheev, A. (2010b). Robustness of ethnocentrism to changes in inter-personal interactions. Complex Adaptive Systems – AAAI Fall Symposium.

Kaznatcheev, A., & Shultz, T. R. (2011). Ethnocentrism maintains cooperation, but keeping one’s children close fuels it. Proceedings of the 33rd Annual Conference of the Cognitive Science Society. 3174-3179.

Shultz, T. R., Hartshorn, M., & Kaznatcheev, A. (2009). Why is ethnocentrism more common than humanitarianism? Proceedings of the 31st Annual Conference of the Cognitive Science Society. 2100-2105.

Evolution of ethnocentrism in the Hammond and Axelrod model

Ethnocentrism is the tendency to favor one’s own group at the expense of others; a bias towards those similar to us. Many social scientists believe that ethnocentrism derives from cultural learning and depends on considerable social and cognitive abilities (Hewstone, Rubin, & Willis, 2002). However, the only fundamental requirement for implementing ethnocentrism is categorical perception. This minimal cognition already merits a rich analysis (Beer, 2003) but is only one step above always cooperating or defecting. Thus, considering strategies that can discriminate in-groups and out-groups is one of the first steps in following the biogenic approach (Lyon, 2006) to game theoretic cognition. In other words, by studying ethnocentrism from an evolutionary game theory perspective, we are trying to follow the bottom-up approach to rationality. Do you know other uses of evolutionary game theory in the cognitive sciences?

The model I am most familiar with for looking at ethnocentrism (in biology circles, usually called the green-beard effect) is Hammond & Axelrod (2006) agent-based model. I present and outline of the model (with my slight modifications) and some basic results.


The world is a square toroidal lattice. Each cell has four neighbors: east, west, north, south. A cell can be inhabited or uninhabited by an agent. The agent (say Alice) is defined by 3 traits: the cell she inhabits; her strategy; and her tag. The tag is an arbitrary quality, and Alice can only perceive if she has the same tag as Bob (the agent she is interacting with) or that their tags differ. When I present this, I usually say that Alice thinks she is a circle and perceives others with the same tag as circles, but those with a different tag as squares. This allows her to have 4 strategies:

The blue Alice cooperates with everybody, regardless of tag; she is a humanitarian. The green Alice cooperates with those of the same tag, but defects from those with a different; she is ethnocentric. The other two tags follow a similar pattern and are traitorous (yellow) and selfish (red). The strategies are deterministic, but we saw earlier that a mixed-strategy approach doesn’t change much.

The simulations follow 3 stages (as summarized in the picture below):

  1. Interaction – agents in adjacent cells play the game between each other (usually a prisoner’s dilemma). Choosing to cooperate or defect for each pair-wise interaction. The payoffs of the games are added to their base probability to reproduce (ptr) to arrive at each agents actual probability to reproduce.
  2. Reproduction – each agent rolls a die in accordance to their probability to reproduce. If they succeed then they produce an offspring which is placed in a list of children-to-be-placed
  3. Death and Placement – each agent on the lattice has a constant probability of dying and vacating their cell. The children-to-be-placed list is randomly permuted and we try to place each child in a cell adjacent to (or in place of) their parent if one is empty. If no empty cell is found, then the child dies

Simulation cycle of the Hammond & Axelrod model

The usual tracked parameters is the distribution of strategies (how many agents follow each strategy) and the proportion of cooperative interactions (the fraction of interactions where both parties chose to cooperate). The world starts with a few agents of each strategy-tag combination and fills up over time.


The early results on the evolution of ethnocentrism are summarized in the following plot.

Early results in the H&A model

Number of agents grouped by strategy versus evolutionary cycle. Humanitarians are blue, ethnocentrics are green, traitorous are yellow, and selfish are red. The results are an average of 30 runs of the H&A model (default ptr = 0.1; death = 0.1; b = 0.025; c = 0.01) with line thickness representing the standard error. The boxes highlight the nature of early results on the H&A model.

Hammond and Axelrod (2006) showed that, after a transient period, ethnocentric agents dominate the population; humanitarians are the second most common, and traitorous and selfish agents are both extremely uncommon. Shultz, Hartshorn, and Hammond (2008) examined the transient period to uncover evidence for early competition between ethnocentric and humanitarian strategies. Shultz, Hartshorn, and Kaznatcheev (2009) focused on explaining the mechanism behind ethnocentric dominance over humanitarians, and observed the co-occurrence of world saturation and humanitarian decline. Kaznatcheev and Shultz (2011) concluded that it is the spatial aspect of the model that creates cooperation; being able to discriminate tags helps maintain cooperation and extend the range of parameters under which it can occur.

As you might have noticed from the simple DFAs drawn in the strategies figure, ethnocentrism and traitorous agents are more complicated than humanitarians or selfish; they are more cognitively complex. Kaznatcheev (2010a) showed that ethnocentrism is not robust to increases in the cost of cognition. Thus, in humans (or simpler organisms) the mechanism allowing discrimination has to have been in place already (and not co-evolved) or be very inexpensive. Kaznatcheev (2010a) also observed that ethnocentrics maintain higher levels of cooperation than humanitarians. Thus, although ethnocentrism seems unfair due to its discriminatory nature, it is not clear that it produces a less friendly world.

The above examples dealt with the prisoner’s dilemma (PD) which is a typical model of a competitive environment. In the PD cooperation is irrational, so ethnocentrism allowed the agents to cooperate irrationally (thus moving over to the better social payoff), while still treating those of a different culture rationally and defecting from them.
Unfortunately, Kaznatcheev (2010b) demonstrated that ethnocentric behavior is robust across a variety of games, even when out-group hostility is classically irrational (the harmony game). In the H&A model, ethnocentrism is a two-edged sword: it can cause unexpected cooperative behavior, but also irrational hostility.


Beer, R. D. (2003). The dynamics of active categoricalperception in an evolved model agent. Adaptive
, 11, 209-243.

Hammond, R., & Axelrod, R. (2006). The Evolution of Ethnocentrism Journal of Conflict Resolution, 50 (6), 926-936 DOI: 10.1177/0022002706293470

Hewstone, M., Rubin, M., & Willis, H. (2002). Intergroup bias. Annual Review of Psychology, 53, 575-604.

Kaznatcheev, A. (2010a). The cognitive cost of ethnocentrism. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd annual conference of the cognitive science society. (pdf)

Kaznatcheev, A. (2010b). Robustness of ethnocentrism to changes in inter-personal interactions. Complex Adaptive Systems – AAAI Fall Symposium. (pdf)

Kaznatcheev, A., & Shultz, T.R. (2011). Ethnocentrism Maintains Cooperation, but Keeping One’s Children Close Fuels It. In L. Carlson, C, Hoelscher, & T.F. Shipley (Eds), Proceedings of the 33rd annual conference of the cognitive science society. (pdf)

Lyon, P. (2006). The biogenic approach to cognition. Cognitive Processing, 7, 11-29.

Shultz, T. R., Hartshorn, M., & Hammond, R. A. (2008). Stages in the evolution of ethnocentrism. In B. Love, K. McRae, & V. Sloutsky (Eds.), Proceedings of the 30th annual conference of the cognitive science society.

Shultz, T. R., Hartshorn, M., & Kaznatcheev, A. (2009). Why is ethnocentrism more common than humanitarianism? In N. Taatgen & H. van Rijn (Eds.), Proceedings of the 31st annual conference of the cognitive science society.

Presentation on evolutionary game theory and cognition

Last week Julian sent me an encouraging email:

did you know that you come up first in google for a search “evolutionary theory mcgill university”? You beat all the profs!

The specific link he was talking about was to my slides from the first time I gave a guest lecture for Tom’s Cognitive Science course in 2009. Today, I gave a similar lecture again; my 3rd year in a row giving a guest lecture for PSYC532. The slides are available here.

I am very happy Tom invited me. It is always fun to share my passion for EGT with students, and I like motivating the connections to cognition. As if often the case, some of the questions during the presentation got me thinking. A particular question I enjoyed, was along the lines of:

If humanitarians cooperate with everyone, and ethnocentrics only cooperate with in-group, then how can we have lower levels of cooperation when the world is dominated by humanitarians?

This was in reference to a result I presented in [Kaz10] about the decrease in cooperative interactions as the cognitive costs of ethnocentrism increases. In particular, even though ethnocentrics are replaced by humanitarians in the population, we don’t see an increase in the proportion of cooperative interactions. In fact, it triggers a decrease in the proportion of cooperative interactions.

I started with my usual answer of the humanitarians allowing more selfish agents to survive, but then realized a second important factor. When the ethnocentric agents are a minority, they no longer form giant same-tag clusters, and are thus much more likely to be defecting (since they are meeting agents of other tags) than cooperating. Thus, the sizable minority of ethnocentrics tend to defect and decrease the proportion of cooperation when living among a majority of humanitarians. On the other hand, when the ethnocentrics are in the majority they are in same-tag clumps and thus tend to cooperate. Of course, I should more closely analyze the simulation data to test this story.

Another attentive student caught a mistake on slide 14 (page 19 of pdf). I have left it for consistency, but hopefully other attentive readers will also notice it. Thank you for being sharp and catching an error I’ve had in my slides for 2 or 3 years now!

To all the students that listened and asked great questions: thank you! If you have any more queries please ask them in the comments. To everyone: how often do you get new insights from the questions you receive during your presentations?


[Kaz10] Kaznatcheev, A. (2010) “The cognitive cost of ethnocentrism.” Proceedings of the 32nd annual conference of the cognitive science society. [pdf]

Replicator dynamics and cognitive cost of agency

During the June 3rd Friday meeting of the LNSC, Akash Venkat asked a fun question of how simple it is to remember your strategy. I interpreted this as a question about how difficult it is to have a deterministic or low-entropy strategy. In other words, a question about the fitness of deterministic vs. randomized strategies. Last week I expressed my skepticism towards randomized strategies for ethnocentrism, this post is meant as a potential counter-balance for ideas we can use to look at randomized strategies.

After the LNSC meeting as I was on the train back to Waterloo, I did some calculations to see how to start thinking about this question. In particular, how to properly tax the cognition associated with behaving non-randomly.

The ability to perform actions (as opposed to randomly floating around in an environment) can be see as the key quality of agency. In an EGT setting, we can capture this idea by saying the strategies that are close to random have low agency, and strategies that are close to deterministic have high agency. Having high energy requires more sophisticated cognition on the part of the agent, and thus we can associate a cost -k with having high agency. Equivalently we can think of giving a bonus of k to low agency players. This allows us to start looking at a toy model right away. Let us consider a general cooperator-defector game:

\begin{pmatrix}  1 & U \\  V & 0  \end{pmatrix}

For this game we have the standard deterministic strategies C and D, and we will now add the random strategy R which with a 50% probability plays C and otherwise plays D. Thus R captures the idea of the least possible agency, and we give it a bonus k to fitness. Letting p be the proportion of C agents, and q the proportion of R agents, we can write down the utilities for all three strategies:

\begin{aligned}  U(C) & = (p + q/2) + (1 - p - q/2)U \\  &= (p + q/2)(1 - U) + U \\  U(D) &= (p + q/2)V \\  U(R) &= \frac{U(C) + U(D)}{2} + k \\  &= (p + \frac{q}{2})(\frac{1 - U + V}{2}) + \frac{U}{2} + k  \end{aligned}

From these equations, we can see that U(R) > U(C),U(D) if k > k^* where:

\begin{aligned}  k^* & = \frac{|U(C) - U(D)|}{2} \\  & = |(p + \frac{q}{2})(\frac{1 - (U + V)}{2}) + \frac{U}{2}|  \end{aligned}

Thus, an agent is not willing to pay more than k^* in order to have agency, because whenever they pay more than k^* they will be dominated by the non-agent R strategy. Note that k^* is a function of p and q, thus it corresponds to the most agents are willing to pay in a given population distribution. To eliminate the dependence on p and q we can consider a population with q arbitrarily close to 1. This will give us a maximum value of k for which
R is not ESS:

k^{\text{ESS}} = \frac{|1 + U - V|}{4}

In other words, if we are really close to the line in game space given by V = 1 + U then agency will evolve only for very small values of k. Thus, we can think of this as the no agency line.

Although at first sight the non-agent strategy being 50%-50% seems natural, it is in fact arbitrary. If we are taking the viewpoint that the non-agent players are simply allowing the environment to pick the action for them, then there is no reason to assume that the environment is unbiased between C and D. It is very reasonable to believe that action C might be more difficult than action D (or vice-versa) and the environment will have a non 50%-50% chance of selection one or the other for the player.

We could introduce a new parameter r for the environments probability to pick C (so in the previous discussion we had r = 0.5) and parametrize the environment by 3 variables, U, V, and r. It is obvious how this would modify the previous equations into:

\begin{aligned}  U(R) & = (rU(C) + (1 - r)U(D)) + k \\  & = (p + \frac{q}{2})(r(1 - U) + (1 - r)V) + rU \\  k^* & = \max((1 - r)(U(C) - U(D)), r(U(D) - U(C))) \\  k^{\text{ESS}} & = \frac{\max((1 - r)(1 + U - V),r(V - U - 1))}{2}  \end{aligned}

Note that this equation is not fundamentally different from the previous ones, and just provides a linear offset to the cognitive cost. This tells us that the natural measure of complexity should be linear in the agent’s degree of randomness. In other words, for a strategy space where an agent can evolve any
randomness factor s and the environment has a preferred randomness r the
natural choice for cognitive complexity k(s) is:

k(s) = k \max(\frac{s - r}{1 - r},\frac{r - s}{r})

Do you think this is a viable model for the cognitive cost of agency? Or should we capture agency in some other way? What do you expect will happen when we take this game over to a structured environment? I expect that unlike the well mixed case, in certain circumstances, the agents will be willing to pay for access to randomness (instead of access to determinism).