## Interface theory of perception can overcome the rationality fetish

I might be preaching to the choir, but I think the web is transformative for science. In particular, I think blogging is a great form or pre-pre-publication (and what I use this blog for), and Q&A sites like MathOverflow and the cstheory StackExchange are an awesome alternative architecture for scientific dialogue and knowledge sharing. This is why I am heavily involved with these media, and why a couple of weeks ago, I nominated myself to be a cstheory moderator. Earlier today, the election ended and Lev Reyzin and I were announced as the two new moderators alongside Suresh Venkatasubramanian, who is staying on to for continuity and to teach us the ropes. I am extremely excited to work alongside Suresh and Lev, and to do my part to continue devoloping the great community that we nurtured over the last three and a half years.

However, I do expect to face some challenges. The only critique raised against our outgoing moderators, was that an argumentative attitude that is acceptable for a normal user can be unfitting for a mod. I definitely have an argumentative attitude, and so I will have to be extra careful to be on my best behavior.

Thankfully, being a moderator on cstheory does not change my status elsewhere on the website, so I can continue to be a normal argumentative member of the Cognitive Sciences StackExchange. That site is already home to one of my most heated debates against the rationality fetish. In particular, I was arguing against the statement that “a perfect Bayesian reasoner [is] a fixed point of Darwinian evolution”. This statement can be decomposed into two key assumptions: a (1) perfect Bayesian reasoner makes the most veridical decisions given its knowledge, and (2) veridicity has greater utility for an agent and will be selected for by natural selection. If we accept both premises then a perfect Bayesian reasoner is a fitness-peak. Of course, as we learned before: even if something is a fitness-peak doesn’t mean we can ever find it.

We can also challenge both of the assumptions (Feldman, 2013); the first on philosophical grounds, and the second on scientific. I want to concentrate on debunking the second assumption because it relates closely to our exploration of objective versus subjective rationality. To make the discussion more precise, I’ll approach the question from the point of view of perception — a perspective I discovered thanks to TheEGG blog; in particular, the comments of recent reader Zach M.

## Cooperation through useful delusions: quasi-magical thinking and subjective utility

Economists that take bounded rationality seriously treat their research like a chess game and follow the reductive approach: start with all the pieces — a fully rational agent — and kill/capture/remove pieces until the game ends, i.e. see what sort of restrictions can be placed on the agents to deviate from rationality and better reflect human behavior. Sometimes these restrictions can be linked to evolution, but usually the models are independent of evolutionary arguments. In contrast, evolutionary game theory has traditionally played Go and concerned itself with the simplest agents that are only capable of behaving according to a fixed strategy specified by their genes — no learning, no reasoning, no built in rationality. If egtheorists want to approximate human behavior then they have to play new stones and take a constructuve approach: start with genetically predetermined agents and build them up to better reflect the richness and variety of human (or even other animal) behaviors (McNamara, 2013). I’ve always preferred Go over chess, and so I am partial to the constructive approach toward rationality. I like to start with replicator dynamics and work my way up, add agency, perception and deception, ethnocentrism, or emotional profiles and general condition behavior.

Most recently, my colleagues and I have been interested in the relationship between evolution and learning, both individual and social. A key realization has been that evolution takes cues from an external reality, while learning is guided by a subjective utility, and there is no a priori reason for those two incentives to align. As such, we can have agents acting rationally on their genetically specified subjective perception of the objective game. To avoid making assumptions about how agents might deal with risk, we want them to know a probability that others will cooperate with them. However, this depends on the agent’s history and local environment, so each agent should learn these probabilities for itself. In our previous presentation of results we concentrated on the case where the agents were rational Bayesian learners, but we know that this is an assumption not justified by evolutionary models or observations of human behavior. Hence, in this post we will explore the possibility that agents can have learning peculiarities like quasi-magical thinking, and how these peculiarities can co-evolve with subjective utilities.

## Quasi-magical thinking and superrationality for Bayesian agents

As part of our objective and subjective rationality model, we want a focal agent to learn the probability that others will cooperate given that the focal agent cooperates ($p$) or defects ($q$). In a previous post we saw how to derive point estimates for $p$ and $q$ (and learnt that they are the maximum likelihood estimates):

$p_0 = \frac{n_{CC} + 1}{n_{CC} + n_{CD} + 2}$, and $q_0 = \frac{n_{DC} + 1}{n_{DC} + n_{DD} + 2}$

where $n_{XY}$ is the number of times Alice displayed behavior $X$ and saw Bob display behavior $Y$. In the above equations, a number like $n_{CD}$ is interpreted by Alice as “the number of times I cooperated and Bob ‘responded’ with a defection”. I put ‘responded’ in quotations because Bob cannot actually condition his behavior on Alice’s action. Note that in this view, Alice is placing herself in a special position of actor, and observing Bob’s behavior in response to her actions; she is failing to put herself in Bob’s shoes. Instead, she can realize that Bob would be interested in doing the same sort of sampling, and interpret $n_{CD}$ more neutrally as “number of times agent 1 cooperates and agent 2 defects”, in this case she will see that for Bob, the equivalent quantity is $n_{DC}$.

## Quasi-magical thinking and the public good

Cooperation is a puzzle because it is not obvious why cooperation, which is good for the group, is so common, despite the fact that defection is often best for the individual. Though we tend to view this issue through the lens of the prisoner’s dilemma, Artem recently pointed me to a paper by Joanna Masel, a mathematical biologist at Stanford, focusing on the public goods game [1]. In this game, each player is given 20 tokens and chooses how many of these they wish to contribute to the common pool. Once players have made their decisions, the pool is multiplied by some factor m (where mn > 1) and the pool is distributed equally back to all players. To optimize the group’s payoff, players should take advantage of the pool’s multiplicative effects by contributing all of their tokens. However, because a player’s share does not depend on the size of their contribution, it is easy to see that this is not the best individual strategy (Nash equilibrium). By contributing nothing to the common pool, a player gets a share of the pool in addition to keeping all of the tokens they initially received. This conflict captures the puzzle of cooperation, which in this case is: Why do human participants routinely contribute about half of their funds, if never contributing is individually optimal?

## Rationality for Bayesian agents

One of the requirements of our objective versus subjective rationality model is to have learning agents that act rationally on their subjective representation of the world. The easiest parameter to consider learning is the probability of agents around you cooperating or defecting. In an one shot game without memory, your partner cannot condition their strategy on your current (or previous actions) directly. However, we don’t want to build knowledge of this into the agents, so we will allow them to learn the conditional probabilities $p$ of seeing a cooperation if they cooperate, and $q$ of seeing a cooperation if they defect. If the agents learning accurately reflects the world then we will have $p = q$.

For now, let us consider learning $p$, the other case will be analogous. In order to be rational, we will require the agent to use Bayesian inference. The hypotheses will be $H_x$ for $0 \leq x \leq 1$ — meaning that the partner has a probability $x$ of cooperation. The agent’s mind is then some probability distribution $f(x)$ over $H_x$, with the expected value of $f$ being $p$. Let us look at how the moments of $f(x)$ change with observations.

Suppose we have some initial distribution $f_0(x)$, with moments $m_{0,k} = \mathbb{E}_{f}[x^k]$. If we know the moments up to step $t$ then how will they behave at the next time step? Assume the partner cooperated:

\begin{aligned} m_{t+1,k} & = \int_0^1 x^k \frac{ P(C|H_x) f_t(x) }{ P(C) } dx \\ & = \frac{1}{ m_{t,1} } \int_0^1 x^{k + 1} f_t(x) dx \\ & = \frac{ m_{t,k+1} }{ m_{t,1} } \end{aligned}

If the partner defected:

\begin{aligned} m_{t+1,k} & = \int_0^1 x^k \frac{ P(D|H_x) f_t(x) }{ P(D) } dx \\ & = \frac{1}{ 1 - m_{t,1} } \int_0^1 (x^k - x^{k + 1}) f_t(x) dx \\ & = \frac{ m_{t,k} - m_{t,k+1} }{1 - m_{t,1} } \end{aligned}

Although tracking moments is easier than updating the whole distribution and sufficient for recovering the quantity of interest ($p$ — average probability of cooperation over $H_x$), it can be further simplified. If $f_0$ is the uniform distribution, then $m_{0,k} = \frac{1}{k + 1}$. What are the moments doing at later times? They’re just counting, which we will prove by induction.

Our inductive hypothesis is that after $t$ observation, with $c$ of them being cooperation (and thus $d = t - c$ defection), we have:

$m_{t,k} = \frac{(c + 1)(c + 2)...(c + k)}{(c + d + 2)(c + d + 3)...(c + d + k + 1)}$.

Note that this hypothesis implies that

$m_{t,k+1} = m_{t,k}\frac{c + k + 1}{c + d + k + 2}$.

If we look at the base case of $t = 0$ (and thus $c = d = 0$) then this simplifies to

$m_{0,k} = \frac{1\cdot 2 \cdot ... \cdot k}{2 \cdot 3 \cdot ... \cdot k + 1} = \frac{k!}{((k + 1)!} = \frac{1}{k + 1}$.

Our base case is met, so let us consider a step. Suppose that our $t + 1$-st observation is a cooperation, then we have:

\begin{aligned} m_{t + 1,k} & = \frac{ m_{t,k+1} }{ m_{t,1} } \\ & = \frac{ c + d + 2 }{c + 1} \frac{(c + 1)(c + 2)...(c + k + 1)}{(c + d + 2)(c + d + 3)...(c + d + k + 2)} \\ & = \frac{(c + 2)...(c + k + 1)}{(c + d + 3)...(c + d + k + 2)} \\ & = \frac{((c + 1) + 1)((c + 1) + 2)...((c + 1) + k)}{((c + 1) + d + 2)((c + 1) + d + 3)...((c + 1) + d + k + 1)} \end{aligned}.

Where the last line is exactly what we expect: observing a cooperation at step $t+1$ means we have seen a total of $c + 1$ cooperations.

If we observe a defection on step $t + 1$, instead, then we have:

\begin{aligned} m_{t + 1,k} & = \frac{m_{t,k} - m_{t,k+1} }{1 - m_{t,1} } \\ & = \frac{ c + d + 2 }{d + 1} m_{t,k}( 1 - \frac{c + k + 1}{c + d + k + 2}) \\ & = \frac{ c + d + 2 }{d + 1} m_{t,k} \frac{d + 1}{c + d + k + 2} \\ & = \frac{(c + 1)(c + 2)...(c + k)}{(c + d + 3)...(c + d + k + 1)(c + d + k + 2)} \\ & = \frac{(c + 1)(c + 2)...(c + k)}{(c + (d + 1) + 2)(c + (d + 1) + 3)...(c + (d + 1) + k + 1)} \end{aligned}

Which is also exactly what we expect: observing a defection at step $t+1$ means we have seen a total of $d + 1$ defections. This completes our proof by induction, and means that our agents need to only store the number of cooperations and defections they have experienced.

I suspect the above theorem is taught in any first statistics course, unfortunately I’ve never had a stats class so I had to recreate the theorem here. If you know the name of this result then please leave it in the comments. For those that haven’t seen this before, I think it is nice to see explicitly how rationally estimating probabilities based on past data reduces to counting that data.

Our agents are then described by two numbers giving their genotype, and four for their mind. For the genotype, there is the values of $U$ and $V$ that mean that the agent thinks it is playing the following cooperate-defect game:

$\begin{pmatrix} 1 & U \\ V & 0 \end{pmatrix}$

For the agents’ mind, we have $n_{CC}, n_{CD}$ which is the number of cooperations and defections the agents saw after cooperation, and $n_{DC}, n_{DD}$ is the same following a defection. From these values and the theorem we just proved, the agent knows that $p = \frac{n_{CC} + 1}{n_{CC} + n_{CD} + 2}$ and $q = \frac{n_{DC} + 1}{n_{DC} + n_{DD} + 2}$. With these values, the agent can calculate the expected subjective utility of cooperating and defecting:

\begin{aligned} \text{Util}(C) & = p + (1 - p)U \\ \text{Util}(D) & = qV \end{aligned}

If $\text{Util}(C) > \text{Util}(D)$ then the agent will cooperate, otherwise — defect. This has a risk of locking an agent into one action forever, say cooperate, and then never having a chance to sample results for defection and thus never update $q$. To avoid this, we use the trembling-hand mechanism (or $\epsilon$-greedy reinforcement learning): with small probability $\epsilon$ the agent performs the opposite action of what it intended.

The above agent is rational with respect to its subjective state $U,V,p,q$ but could be acting irrationally with respect to the objective game $\begin{pmatrix}1 & X \\ Y & 0 \end{pmatrix}$ and proportion of cooperation $r$.