## Rationality, the Bayesian mind and their limits

Bayesianism is one of the more popular frameworks in cognitive science. Alongside other similar probalistic models of cognition, it is highly encouraged in the cognitive sciences (Chater, Tenenbaum, & Yuille, 2006). To summarize Bayesianism far too succinctly: it views the human mind as full of beliefs that we view as true with some subjective probability. We then act on these beliefs to maximize expected return (or maybe just satisfice) and update the beliefs according to Bayes’ law. For a better overview, I would recommend the foundations work of Tom Griffiths (in particular, see Griffiths & Yuille, 2008; Perfors et al., 2011).

This use of Bayes’ law has lead to a widespread association of Bayesianism with rationality, especially across the internet in places like LessWrong — Kat Soja has written a good overview of Bayesianism there. I’ve already written a number of posts about the dangers of fetishizing rationality and some approaches to addressing them; including bounded rationality, Baldwin effect, and interface theory. I some of these, I’ve touched on Bayesianism. I’ve also written about how to design Baysian agents for simulations in cognitive science and evolutionary game theory, and even connected it to quasi-magical thinking and Hofstadter’s superrationality for Kaznatcheev, Montrey & Shultz (2010; see also Masel, 2007).

But I haven’t written about Bayesianism itself.

In this post, I want to focus on some of the challenges faced by Bayesianism and the associated view of rationality. And maybe point to some approach to resolving them. This is based in part of three old questions from the Cognitive Sciences StackExhange: What are some of the drawbacks to probabilistic models of cognition?; What tasks does Bayesian decision-making model poorly?; and What are popular rationalist responses to Tversky & Shafir?

## Emotional contagion and rational argument in philosophical texts

Last week I returned to blogging with some reflections on reading and the written word more generally. Originally, I was aiming to write a response to Roger Schank’s stance that “reading is no way to learn”, but I wandered off on too many tangents for an a single post or for a coherent argument. The tangent that I left for this post is the role of emotion and personality in philosophical texts.

In my last entry, I focused on the medium independent aspects of Schank’s argument, and identified two dimensions along which a piece of media and our engagement with it can vary: (1) passive consumption versus active participation, and (2) the level of personalization. The first continuum has a clearly better end on the side of more active engagement. If we are comparing mediums then we should prefer ones that foster more active engagement from the participants. The second dimension is more ambiguous: sometimes a more general piece of media is better than a bespoke piece. What is better becomes particularly ambiguous when being forced to adapt a general approach to your special circumstances encourages more active engagement.

In this post, I will shift focus from comparing mediums to a particular aspect of text and arguments: emotional engagement. Of course, this also shows up in other mediums, but my goal this time is not to argue across mediums.

## Passive vs. active reading and personalization

As you can probably tell, dear reader, recently I have been spending too much time reading and not enough time writing. The blog has been silent. What better way to break this silence than to write a defense of reading? Well, sort of. It would not be much of an eye-opener for you — nor a challenge for me — to simply argue for reading. Given how you are consuming this content, you probably already think that the written word is a worthwhile medium. Given how I am presenting myself, I probably think the same. But are our actions really an endorsement of reading or just the form of communication we begrudgingly resort to because of a lack of better alternatives?

Ostensibly this post will be a qualified defense against an attack on reading by Roger Schank at Education Outrage. Although it is probably best to read it as just a series of reflections on my own experience.[1]

I will focus on the medium-independent aspects of learning that I think give weight to Schank’s argument: the distinction between passive and active learning, and the level of personalization. This will be followed next week by a tangent discussion on the importance of emotional aspects of the text, and close with some reflections on the role of literary value, historic context, and fiction in philosophical arguments. This last point is prompted more by my recent readings of Plato than by Schank. In other words, much like last year, I will rely on Socrates to help get me out of a writing slump.

## Memes, compound strategies, and factoring the replicator equation

When you work with evolutionary game theory for a while, you end up accumulating an arsenal of cute tools and tricks. A lot of them are obvious once you’ve seen them, but you usually wouldn’t bother looking for them if you hadn’t know they existed. In particular, you become very good friends with the replicator equation. A trick that I find useful at times — and that has come up recently in my on-going project with Robert Vander Veldge, David Basanta, and Jacob Scott — is nesting replicator dynamics (or the dual notion of factoring the replicator equation). I wanted to share a relatively general version of this trick with you, and provide an interpretation of it that is of interest to people — like me — who care about the interaction of evolution in learning. In particular, we will consider a world of evolving agents where each agent is complex enough to learn through reinforcement and pass its knowledge to its offspring. We will see that in this setting, the dynamics of the basic ideas — or memes — that the agents consider can be studied in a world of selfish memes independent of the agents that host them.

## Useful delusions, interface theory of perception, and religion

As you can guess from the name, evolutionary game theory (EGT) traces its roots to economics and evolutionary biology. Both of the progenitor fields assume it impossible, or unreasonably difficult, to observe the internal representations, beliefs, and preferences of the agents they model, and thus adopt a largely behaviorist view. My colleagues and I, however, are interested in looking at learning from the cognitive science tradition. In particular, we are interested in the interaction of evolution and learning. This interaction in of itself is not innovative, it has been a concern for biologists since Baldwin (1886, 1902), and Smead & Zollman (2009; Smead 2012) even brought the interaction into an EGT framework and showed that rational learning is not necessarily a ‘fixed-point of Darwinian evolution’. But all the previous work that I’ve encountered at this interface has made a simple implicit assumption, and I wanted to question it.

It is relatively clear that evolution acts objectively and without regard for individual agents’ subjective experience except in so far as that experience determines behavior. On the other hand, learning, from the cognitive sciences perspective at least, acts on the subjective experiences of the agent. There is an inherent tension here between the objective and subjective perspective that becomes most obvious in the social learning setting, but is still present for individual learners. Most previous work has sidestepped this issue by either not delving into the internal mechanism of how agents decide to act — something that is incompatible with the cognitive science perspective — or assuming that subjective representations are true to objective reality — something for which we have no a priori justification.

A couple of years ago, I decided to look at this question directly by developing the objective-subjective rationality model. Marcel and I fleshed out the model by adding a mechanism for simple Bayesian learning; this came with an extra perk of allowing us to adopt Masel’s (2007) approach to looking at quasi-magical thinking as an inferential bias. To round out the team with some cognitive science expertise, we asked Tom to join. A few days ago, after an unhurried pace and over 15 relevant blog posts, we released our first paper on the topic (Kaznatcheev, Montrey & Shultz, 2014) along with its MatLab code.

## Misleading models: “How learning can guide evolution”

I often see examples of mathematicians, physicists, or computer scientists transitioning into other scientific disciplines and going on to great success. However, the converse is rare, and the only two examples I know is Edward Witten’s transition from an undergad in history and linguistics to a ground-breaking career in theoretical physicist, and Geoffrey Hinton‘s transition from an undergrad in experimental psychology to a trend setting career in artificial intelligence. Although in my mind Hinton is associated with neural networks and deep learning, that isn’t his only contribution in fields close to my heart. As is becoming pleasantly common on TheEGG, this is a connection I would have missed if it wasn’t for Graham Jones‘ insightful comment and subsequent email discussion in early October.

The reason I raise the topic four months later, is because the connection continues our exploration of learning and evolution. In particular, Hinton & Nowlan (1987) were the first to show the Baldwin effect in action. They showed how learning can speed up evolution in model that combined a genetic algorithm with learning by trial and error. Although the model was influential, I fear that it is misleading and the strength of its results are often misinterpreted. As such, I wanted to explore these shortcomings and spell out what would be a convincing demonstration of a qualitative increase in adaptability due to learning.

## Phenotypic plasticity, learning, and evolution

Learning and evolution are eerily similar, yet different.

This tension fuels my interest in understanding how they interact. In the context of social learning, we can think of learning and evolution as different dynamics. For individual learning, however, it is harder to find a difference. On the one hand, this has led learning experts like Valiant (2009) to suggest that evolution is a subset of machine learning. On the other hand, due to its behaviorist roots, a lot of evolutionary thought simply ignored learning or did not treat it explicitly. To find interesting interactions between the two concepts we have to turn to ideas from before the modern synthesis — the Simpson-Baldwin effect (Baldwin 1886, 1902; Simpson, 1953):

## Baldwin effect and overcoming the rationality fetish

G.G. Simpson and J.M. Baldwin

As I’ve mentioned previously, one of the amazing features of the internet is that you can take almost any idea and find a community obsessed with it. Thus, it isn’t surprising that there is a prominent subculture that fetishizes rationality and Bayesian learning. They tend to accumulate around forums with promising titles like OvercomingBias and Less Wrong. Since these communities like to stay abreast with science, they often offer evolutionary justifications for why humans might be Bayesian learners and claim a “perfect Bayesian reasoner as a fixed point of Darwinian evolution”. This lets them side-stepped observed non-Bayesian behavior in humans, by saying that we are evolving towards, but haven’t yet reached this (potentially unreachable, but approximable) fixed point. Unfortunately, even the fixed-point argument is naive of critiques like the Simpson-Baldwin effect.

Introduced in 1896 by psychologist J.M. Baldwin then named and reconciled with the modern synthesis by leading paleontologist G.G. Simpson (1953), the Simpson-Baldwin effect posits that “[c]haracters individually acquired by members of a group of organisms may eventually, under the influence of selection, be reenforced or replaced by similar hereditary characters” (Simpson, 1953). More explicitly, it consists of a three step process (some of which can occur in parallel or partially so):

1. Organisms adapt to the environment individually.
2. Genetic factors produce hereditary characteristics similar to the ones made available by individual adaptation.
3. These hereditary traits are favoured by natural selection and spread in the population.

The overall result is that originally individual non-hereditary adaptation become hereditary. For Baldwin (1886,1902) and other early proponents (Morgan 1886; Osborn 1886, 1887) this was a way to reconcile Darwinian and strong Lamarkian evolution. With the latter model of evolution exorcised from the modern synthesis, Simpson’s restatement became a paradox: why do we observe the costly mechanism and associated errors of individual learning, if learning does not enhance individual fitness at equilibrium and will be replaced by simpler non-adaptive strategies? This encompass more specific cases like Rogers’ paradox (Boyd & Richerson, 1985; Rogers, 1988) of social learning.

## Quasi-magical thinking and superrationality for Bayesian agents

As part of our objective and subjective rationality model, we want a focal agent to learn the probability that others will cooperate given that the focal agent cooperates ($p$) or defects ($q$). In a previous post we saw how to derive point estimates for $p$ and $q$ (and learnt that they are the maximum likelihood estimates):

$p_0 = \frac{n_{CC} + 1}{n_{CC} + n_{CD} + 2}$, and $q_0 = \frac{n_{DC} + 1}{n_{DC} + n_{DD} + 2}$

where $n_{XY}$ is the number of times Alice displayed behavior $X$ and saw Bob display behavior $Y$. In the above equations, a number like $n_{CD}$ is interpreted by Alice as “the number of times I cooperated and Bob ‘responded’ with a defection”. I put ‘responded’ in quotations because Bob cannot actually condition his behavior on Alice’s action. Note that in this view, Alice is placing herself in a special position of actor, and observing Bob’s behavior in response to her actions; she is failing to put herself in Bob’s shoes. Instead, she can realize that Bob would be interested in doing the same sort of sampling, and interpret $n_{CD}$ more neutrally as “number of times agent 1 cooperates and agent 2 defects”, in this case she will see that for Bob, the equivalent quantity is $n_{DC}$.

## Rationality for Bayesian agents

One of the requirements of our objective versus subjective rationality model is to have learning agents that act rationally on their subjective representation of the world. The easiest parameter to consider learning is the probability of agents around you cooperating or defecting. In an one shot game without memory, your partner cannot condition their strategy on your current (or previous actions) directly. However, we don’t want to build knowledge of this into the agents, so we will allow them to learn the conditional probabilities $p$ of seeing a cooperation if they cooperate, and $q$ of seeing a cooperation if they defect. If the agents learning accurately reflects the world then we will have $p = q$.

For now, let us consider learning $p$, the other case will be analogous. In order to be rational, we will require the agent to use Bayesian inference. The hypotheses will be $H_x$ for $0 \leq x \leq 1$ — meaning that the partner has a probability $x$ of cooperation. The agent’s mind is then some probability distribution $f(x)$ over $H_x$, with the expected value of $f$ being $p$. Let us look at how the moments of $f(x)$ change with observations.

Suppose we have some initial distribution $f_0(x)$, with moments $m_{0,k} = \mathbb{E}_{f}[x^k]$. If we know the moments up to step $t$ then how will they behave at the next time step? Assume the partner cooperated:

\begin{aligned} m_{t+1,k} & = \int_0^1 x^k \frac{ P(C|H_x) f_t(x) }{ P(C) } dx \\ & = \frac{1}{ m_{t,1} } \int_0^1 x^{k + 1} f_t(x) dx \\ & = \frac{ m_{t,k+1} }{ m_{t,1} } \end{aligned}

If the partner defected:

\begin{aligned} m_{t+1,k} & = \int_0^1 x^k \frac{ P(D|H_x) f_t(x) }{ P(D) } dx \\ & = \frac{1}{ 1 - m_{t,1} } \int_0^1 (x^k - x^{k + 1}) f_t(x) dx \\ & = \frac{ m_{t,k} - m_{t,k+1} }{1 - m_{t,1} } \end{aligned}

Although tracking moments is easier than updating the whole distribution and sufficient for recovering the quantity of interest ($p$ — average probability of cooperation over $H_x$), it can be further simplified. If $f_0$ is the uniform distribution, then $m_{0,k} = \frac{1}{k + 1}$. What are the moments doing at later times? They’re just counting, which we will prove by induction.

Our inductive hypothesis is that after $t$ observation, with $c$ of them being cooperation (and thus $d = t - c$ defection), we have:

$m_{t,k} = \frac{(c + 1)(c + 2)...(c + k)}{(c + d + 2)(c + d + 3)...(c + d + k + 1)}$.

Note that this hypothesis implies that

$m_{t,k+1} = m_{t,k}\frac{c + k + 1}{c + d + k + 2}$.

If we look at the base case of $t = 0$ (and thus $c = d = 0$) then this simplifies to

$m_{0,k} = \frac{1\cdot 2 \cdot ... \cdot k}{2 \cdot 3 \cdot ... \cdot k + 1} = \frac{k!}{((k + 1)!} = \frac{1}{k + 1}$.

Our base case is met, so let us consider a step. Suppose that our $t + 1$-st observation is a cooperation, then we have:

\begin{aligned} m_{t + 1,k} & = \frac{ m_{t,k+1} }{ m_{t,1} } \\ & = \frac{ c + d + 2 }{c + 1} \frac{(c + 1)(c + 2)...(c + k + 1)}{(c + d + 2)(c + d + 3)...(c + d + k + 2)} \\ & = \frac{(c + 2)...(c + k + 1)}{(c + d + 3)...(c + d + k + 2)} \\ & = \frac{((c + 1) + 1)((c + 1) + 2)...((c + 1) + k)}{((c + 1) + d + 2)((c + 1) + d + 3)...((c + 1) + d + k + 1)} \end{aligned}.

Where the last line is exactly what we expect: observing a cooperation at step $t+1$ means we have seen a total of $c + 1$ cooperations.

If we observe a defection on step $t + 1$, instead, then we have:

\begin{aligned} m_{t + 1,k} & = \frac{m_{t,k} - m_{t,k+1} }{1 - m_{t,1} } \\ & = \frac{ c + d + 2 }{d + 1} m_{t,k}( 1 - \frac{c + k + 1}{c + d + k + 2}) \\ & = \frac{ c + d + 2 }{d + 1} m_{t,k} \frac{d + 1}{c + d + k + 2} \\ & = \frac{(c + 1)(c + 2)...(c + k)}{(c + d + 3)...(c + d + k + 1)(c + d + k + 2)} \\ & = \frac{(c + 1)(c + 2)...(c + k)}{(c + (d + 1) + 2)(c + (d + 1) + 3)...(c + (d + 1) + k + 1)} \end{aligned}

Which is also exactly what we expect: observing a defection at step $t+1$ means we have seen a total of $d + 1$ defections. This completes our proof by induction, and means that our agents need to only store the number of cooperations and defections they have experienced.

I suspect the above theorem is taught in any first statistics course, unfortunately I’ve never had a stats class so I had to recreate the theorem here. If you know the name of this result then please leave it in the comments. For those that haven’t seen this before, I think it is nice to see explicitly how rationally estimating probabilities based on past data reduces to counting that data.

Our agents are then described by two numbers giving their genotype, and four for their mind. For the genotype, there is the values of $U$ and $V$ that mean that the agent thinks it is playing the following cooperate-defect game:

$\begin{pmatrix} 1 & U \\ V & 0 \end{pmatrix}$

For the agents’ mind, we have $n_{CC}, n_{CD}$ which is the number of cooperations and defections the agents saw after cooperation, and $n_{DC}, n_{DD}$ is the same following a defection. From these values and the theorem we just proved, the agent knows that $p = \frac{n_{CC} + 1}{n_{CC} + n_{CD} + 2}$ and $q = \frac{n_{DC} + 1}{n_{DC} + n_{DD} + 2}$. With these values, the agent can calculate the expected subjective utility of cooperating and defecting:

\begin{aligned} \text{Util}(C) & = p + (1 - p)U \\ \text{Util}(D) & = qV \end{aligned}

If $\text{Util}(C) > \text{Util}(D)$ then the agent will cooperate, otherwise — defect. This has a risk of locking an agent into one action forever, say cooperate, and then never having a chance to sample results for defection and thus never update $q$. To avoid this, we use the trembling-hand mechanism (or $\epsilon$-greedy reinforcement learning): with small probability $\epsilon$ the agent performs the opposite action of what it intended.

The above agent is rational with respect to its subjective state $U,V,p,q$ but could be acting irrationally with respect to the objective game $\begin{pmatrix}1 & X \\ Y & 0 \end{pmatrix}$ and proportion of cooperation $r$.