January 28, 2013
by Artem Kaznatcheev
One of the requirements of our objective versus subjective rationality model is to have learning agents that act rationally on their subjective representation of the world. The easiest parameter to consider learning is the probability of agents around you cooperating or defecting. In an one shot game without memory, your partner cannot condition their strategy on your current (or previous actions) directly. However, we don’t want to build knowledge of this into the agents, so we will allow them to learn the conditional probabilities
of seeing a cooperation if they cooperate, and
of seeing a cooperation if they defect. If the agents learning accurately reflects the world then we will have
.
For now, let us consider learning
, the other case will be analogous. In order to be rational, we will require the agent to use Bayesian inference. The hypotheses will be
for
— meaning that the partner has a probability
of cooperation. The agent’s mind is then some probability distribution
over
, with the expected value of
being
. Let us look at how the moments of
change with observations.
Suppose we have some initial distribution
, with moments
. If we know the moments up to step
then how will they behave at the next time step? Assume the partner cooperated:

If the partner defected:

Although tracking moments is easier than updating the whole distribution and sufficient for recovering the quantity of interest (
— average probability of cooperation over
), it can be further simplified. If
is the uniform distribution, then
. What are the moments doing at later times? They’re just counting, which we will prove by induction.
Our inductive hypothesis is that after
observation, with
of them being cooperation (and thus
defection), we have:
.
Note that this hypothesis implies that
.
If we look at the base case of
(and thus
) then this simplifies to
.
Our base case is met, so let us consider a step. Suppose that our
-st observation is a cooperation, then we have:
.
Where the last line is exactly what we expect: observing a cooperation at step
means we have seen a total of
cooperations.
If we observe a defection on step
, instead, then we have:

Which is also exactly what we expect: observing a defection at step
means we have seen a total of
defections. This completes our proof by induction, and means that our agents need to only store the number of cooperations and defections they have experienced.
I suspect the above theorem is taught in any first statistics course, unfortunately I’ve never had a stats class so I had to recreate the theorem here. If you know the name of this result then please leave it in the comments. For those that haven’t seen this before, I think it is nice to see explicitly how rationally estimating probabilities based on past data reduces to counting that data.
Our agents are then described by two numbers giving their genotype, and four for their mind. For the genotype, there is the values of
and
that mean that the agent thinks it is playing the following cooperate-defect game:

For the agents’ mind, we have
which is the number of cooperations and defections the agents saw after cooperation, and
is the same following a defection. From these values and the theorem we just proved, the agent knows that
and
. With these values, the agent can calculate the expected subjective utility of cooperating and defecting:

If
then the agent will cooperate, otherwise — defect. This has a risk of locking an agent into one action forever, say cooperate, and then never having a chance to sample results for defection and thus never update
. To avoid this, we use the trembling-hand mechanism (or
-greedy reinforcement learning): with small probability
the agent performs the opposite action of what it intended.
The above agent is rational with respect to its subjective state
but could be acting irrationally with respect to the objective game
and proportion of cooperation
.
Rationality, the Bayesian mind and their limits
September 7, 2019 by Artem Kaznatcheev 1 Comment
Bayesianism is one of the more popular frameworks in cognitive science. Alongside other similar probalistic models of cognition, it is highly encouraged in the cognitive sciences (Chater, Tenenbaum, & Yuille, 2006). To summarize Bayesianism far too succinctly: it views the human mind as full of beliefs that we view as true with some subjective probability. We then act on these beliefs to maximize expected return (or maybe just satisfice) and update the beliefs according to Bayes’ law. For a better overview, I would recommend the foundations work of Tom Griffiths (in particular, see Griffiths & Yuille, 2008; Perfors et al., 2011).
This use of Bayes’ law has lead to a widespread association of Bayesianism with rationality, especially across the internet in places like LessWrong — Kat Soja has written a good overview of Bayesianism there. I’ve already written a number of posts about the dangers of fetishizing rationality and some approaches to addressing them; including bounded rationality, Baldwin effect, and interface theory. I some of these, I’ve touched on Bayesianism. I’ve also written about how to design Baysian agents for simulations in cognitive science and evolutionary game theory, and even connected it to quasi-magical thinking and Hofstadter’s superrationality for Kaznatcheev, Montrey & Shultz (2010; see also Masel, 2007).
But I haven’t written about Bayesianism itself.
In this post, I want to focus on some of the challenges faced by Bayesianism and the associated view of rationality. And maybe point to some approach to resolving them. This is based in part of three old questions from the Cognitive Sciences StackExhange: What are some of the drawbacks to probabilistic models of cognition?; What tasks does Bayesian decision-making model poorly?; and What are popular rationalist responses to Tversky & Shafir?
Read more of this post
Filed under Commentary, Preliminary, Reviews Tagged with bayesian, cognitive science, learning, prisoner's dilemma, rationality