## Emotional contagion and rational argument in philosophical texts

Last week I returned to blogging with some reflections on reading and the written word more generally. Originally, I was aiming to write a response to Roger Schank’s stance that “reading is no way to learn”, but I wandered off on too many tangents for an a single post or for a coherent argument. The tangent that I left for this post is the role of emotion and personality in philosophical texts.

In my last entry, I focused on the medium independent aspects of Schank’s argument, and identified two dimensions along which a piece of media and our engagement with it can vary: (1) passive consumption versus active participation, and (2) the level of personalization. The first continuum has a clearly better end on the side of more active engagement. If we are comparing mediums then we should prefer ones that foster more active engagement from the participants. The second dimension is more ambiguous: sometimes a more general piece of media is better than a bespoke piece. What is better becomes particularly ambiguous when being forced to adapt a general approach to your special circumstances encourages more active engagement.

In this post, I will shift focus from comparing mediums to a particular aspect of text and arguments: emotional engagement. Of course, this also shows up in other mediums, but my goal this time is not to argue across mediums.

## Passive vs. active reading and personalization

As you can probably tell, dear reader, recently I have been spending too much time reading and not enough time writing. The blog has been silent. What better way to break this silence than to write a defense of reading? Well, sort of. It would not be much of an eye-opener for you — nor a challenge for me — to simply argue for reading. Given how you are consuming this content, you probably already think that the written word is a worthwhile medium. Given how I am presenting myself, I probably think the same. But are our actions really an endorsement of reading or just the form of communication we begrudgingly resort to because of a lack of better alternatives?

Ostensibly this post will be a qualified defense against an attack on reading by Roger Schank at Education Outrage. Although it is probably best to read it as just a series of reflections on my own experience.[1]

I will focus on the medium-independent aspects of learning that I think give weight to Schank’s argument: the distinction between passive and active learning, and the level of personalization. This will be followed next week by a tangent discussion on the importance of emotional aspects of the text, and close with some reflections on the role of literary value, historic context, and fiction in philosophical arguments. This last point is prompted more by my recent readings of Plato than by Schank. In other words, much like last year, I will rely on Socrates to help get me out of a writing slump.
Read more of this post

## Memes, compound strategies, and factoring the replicator equation

When you work with evolutionary game theory for a while, you end up accumulating an arsenal of cute tools and tricks. A lot of them are obvious once you’ve seen them, but you usually wouldn’t bother looking for them if you hadn’t know they existed. In particular, you become very good friends with the replicator equation. A trick that I find useful at times — and that has come up recently in my on-going project with Robert Vander Veldge, David Basanta, and Jacob Scott — is nesting replicator dynamics (or the dual notion of factoring the replicator equation). I wanted to share a relatively general version of this trick with you, and provide an interpretation of it that is of interest to people — like me — who care about the interaction of evolution in learning. In particular, we will consider a world of evolving agents where each agent is complex enough to learn through reinforcement and pass its knowledge to its offspring. We will see that in this setting, the dynamics of the basic ideas — or memes — that the agents consider can be studied in a world of selfish memes independent of the agents that host them.
Read more of this post

## Useful delusions, interface theory of perception, and religion

As you can guess from the name, evolutionary game theory (EGT) traces its roots to economics and evolutionary biology. Both of the progenitor fields assume it impossible, or unreasonably difficult, to observe the internal representations, beliefs, and preferences of the agents they model, and thus adopt a largely behaviorist view. My colleagues and I, however, are interested in looking at learning from the cognitive science tradition. In particular, we are interested in the interaction of evolution and learning. This interaction in of itself is not innovative, it has been a concern for biologists since Baldwin (1886, 1902), and Smead & Zollman (2009; Smead 2012) even brought the interaction into an EGT framework and showed that rational learning is not necessarily a ‘fixed-point of Darwinian evolution’. But all the previous work that I’ve encountered at this interface has made a simple implicit assumption, and I wanted to question it.

It is relatively clear that evolution acts objectively and without regard for individual agents’ subjective experience except in so far as that experience determines behavior. On the other hand, learning, from the cognitive sciences perspective at least, acts on the subjective experiences of the agent. There is an inherent tension here between the objective and subjective perspective that becomes most obvious in the social learning setting, but is still present for individual learners. Most previous work has sidestepped this issue by either not delving into the internal mechanism of how agents decide to act — something that is incompatible with the cognitive science perspective — or assuming that subjective representations are true to objective reality — something for which we have no a priori justification.

A couple of years ago, I decided to look at this question directly by developing the objective-subjective rationality model. Marcel and I fleshed out the model by adding a mechanism for simple Bayesian learning; this came with an extra perk of allowing us to adopt Masel’s (2007) approach to looking at quasi-magical thinking as an inferential bias. To round out the team with some cognitive science expertise, we asked Tom to join. A few days ago, after an unhurried pace and over 15 relevant blog posts, we released our first paper on the topic (Kaznatcheev, Montrey & Shultz, 2014) along with its MatLab code.
Read more of this post

## Misleading models: “How learning can guide evolution”

I often see examples of mathematicians, physicists, or computer scientists transitioning into other scientific disciplines and going on to great success. However, the converse is rare, and the only two examples I know is Edward Witten’s transition from an undergad in history and linguistics to a ground-breaking career in theoretical physicist, and Geoffrey Hinton‘s transition from an undergrad in experimental psychology to a trend setting career in artificial intelligence. Although in my mind Hinton is associated with neural networks and deep learning, that isn’t his only contribution in fields close to my heart. As is becoming pleasantly common on TheEGG, this is a connection I would have missed if it wasn’t for Graham Jones‘ insightful comment and subsequent email discussion in early October.

The reason I raise the topic four months later, is because the connection continues our exploration of learning and evolution. In particular, Hinton & Nowlan (1987) were the first to show the Baldwin effect in action. They showed how learning can speed up evolution in model that combined a genetic algorithm with learning by trial and error. Although the model was influential, I fear that it is misleading and the strength of its results are often misinterpreted. As such, I wanted to explore these shortcomings and spell out what would be a convincing demonstration of a qualitative increase in adaptability due to learning.
Read more of this post

## Phenotypic plasticity, learning, and evolution

Learning and evolution are eerily similar, yet different.

This tension fuels my interest in understanding how they interact. In the context of social learning, we can think of learning and evolution as different dynamics. For individual learning, however, it is harder to find a difference. On the one hand, this has led learning experts like Valiant (2009) to suggest that evolution is a subset of machine learning. On the other hand, due to its behaviorist roots, a lot of evolutionary thought simply ignored learning or did not treat it explicitly. To find interesting interactions between the two concepts we have to turn to ideas from before the modern synthesis — the Simpson-Baldwin effect (Baldwin 1886, 1902; Simpson, 1953):
Read more of this post

## Baldwin effect and overcoming the rationality fetish

G.G. Simpson and J.M. Baldwin

As I’ve mentioned previously, one of the amazing features of the internet is that you can take almost any idea and find a community obsessed with it. Thus, it isn’t surprising that there is a prominent subculture that fetishizes rationality and Bayesian learning. They tend to accumulate around forums with promising titles like OvercomingBias and Less Wrong. Since these communities like to stay abreast with science, they often offer evolutionary justifications for why humans might be Bayesian learners and claim a “perfect Bayesian reasoner as a fixed point of Darwinian evolution”. This lets them side-stepped observed non-Bayesian behavior in humans, by saying that we are evolving towards, but haven’t yet reached this (potentially unreachable, but approximable) fixed point. Unfortunately, even the fixed-point argument is naive of critiques like the Simpson-Baldwin effect.

Introduced in 1896 by psychologist J.M. Baldwin then named and reconciled with the modern synthesis by leading paleontologist G.G. Simpson (1953), the Simpson-Baldwin effect posits that “[c]haracters individually acquired by members of a group of organisms may eventually, under the influence of selection, be reenforced or replaced by similar hereditary characters” (Simpson, 1953). More explicitly, it consists of a three step process (some of which can occur in parallel or partially so):

1. Organisms adapt to the environment individually.
2. Genetic factors produce hereditary characteristics similar to the ones made available by individual adaptation.
3. These hereditary traits are favoured by natural selection and spread in the population.

The overall result is that originally individual non-hereditary adaptation become hereditary. For Baldwin (1886,1902) and other early proponents (Morgan 1886; Osborn 1886, 1887) this was a way to reconcile Darwinian and strong Lamarkian evolution. With the latter model of evolution exorcised from the modern synthesis, Simpson’s restatement became a paradox: why do we observe the costly mechanism and associated errors of individual learning, if learning does not enhance individual fitness at equilibrium and will be replaced by simpler non-adaptive strategies? This encompass more specific cases like Rogers’ paradox (Boyd & Richerson, 1985; Rogers, 1988) of social learning.
Read more of this post

## Quasi-magical thinking and superrationality for Bayesian agents

As part of our objective and subjective rationality model, we want a focal agent to learn the probability that others will cooperate given that the focal agent cooperates ($p$) or defects ($q$). In a previous post we saw how to derive point estimates for $p$ and $q$ (and learnt that they are the maximum likelihood estimates):

$p_0 = \frac{n_{CC} + 1}{n_{CC} + n_{CD} + 2}$, and $q_0 = \frac{n_{DC} + 1}{n_{DC} + n_{DD} + 2}$

where $n_{XY}$ is the number of times Alice displayed behavior $X$ and saw Bob display behavior $Y$. In the above equations, a number like $n_{CD}$ is interpreted by Alice as “the number of times I cooperated and Bob ‘responded’ with a defection”. I put ‘responded’ in quotations because Bob cannot actually condition his behavior on Alice’s action. Note that in this view, Alice is placing herself in a special position of actor, and observing Bob’s behavior in response to her actions; she is failing to put herself in Bob’s shoes. Instead, she can realize that Bob would be interested in doing the same sort of sampling, and interpret $n_{CD}$ more neutrally as “number of times agent 1 cooperates and agent 2 defects”, in this case she will see that for Bob, the equivalent quantity is $n_{DC}$.
Read more of this post

## Rationality for Bayesian agents

One of the requirements of our objective versus subjective rationality model is to have learning agents that act rationally on their subjective representation of the world. The easiest parameter to consider learning is the probability of agents around you cooperating or defecting. In an one shot game without memory, your partner cannot condition their strategy on your current (or previous actions) directly. However, we don’t want to build knowledge of this into the agents, so we will allow them to learn the conditional probabilities $p$ of seeing a cooperation if they cooperate, and $q$ of seeing a cooperation if they defect. If the agents learning accurately reflects the world then we will have $p = q$.

For now, let us consider learning $p$, the other case will be analogous. In order to be rational, we will require the agent to use Bayesian inference. The hypotheses will be $H_x$ for $0 \leq x \leq 1$ — meaning that the partner has a probability $x$ of cooperation. The agent’s mind is then some probability distribution $f(x)$ over $H_x$, with the expected value of $f$ being $p$. Let us look at how the moments of $f(x)$ change with observations.

Suppose we have some initial distribution $f_0(x)$, with moments $m_{0,k} = \mathbb{E}_{f}[x^k]$. If we know the moments up to step $t$ then how will they behave at the next time step? Assume the partner cooperated:

\begin{aligned} m_{t+1,k} & = \int_0^1 x^k \frac{ P(C|H_x) f_t(x) }{ P(C) } dx \\ & = \frac{1}{ m_{t,1} } \int_0^1 x^{k + 1} f_t(x) dx \\ & = \frac{ m_{t,k+1} }{ m_{t,1} } \end{aligned}

If the partner defected:

\begin{aligned} m_{t+1,k} & = \int_0^1 x^k \frac{ P(D|H_x) f_t(x) }{ P(D) } dx \\ & = \frac{1}{ 1 - m_{t,1} } \int_0^1 (x^k - x^{k + 1}) f_t(x) dx \\ & = \frac{ m_{t,k} - m_{t,k+1} }{1 - m_{t,1} } \end{aligned}

Although tracking moments is easier than updating the whole distribution and sufficient for recovering the quantity of interest ($p$ — average probability of cooperation over $H_x$), it can be further simplified. If $f_0$ is the uniform distribution, then $m_{0,k} = \frac{1}{k + 1}$. What are the moments doing at later times? They’re just counting, which we will prove by induction.

Our inductive hypothesis is that after $t$ observation, with $c$ of them being cooperation (and thus $d = t - c$ defection), we have:

$m_{t,k} = \frac{(c + 1)(c + 2)...(c + k)}{(c + d + 2)(c + d + 3)...(c + d + k + 1)}$.

Note that this hypothesis implies that

$m_{t,k+1} = m_{t,k}\frac{c + k + 1}{c + d + k + 2}$.

If we look at the base case of $t = 0$ (and thus $c = d = 0$) then this simplifies to

$m_{0,k} = \frac{1\cdot 2 \cdot ... \cdot k}{2 \cdot 3 \cdot ... \cdot k + 1} = \frac{k!}{((k + 1)!} = \frac{1}{k + 1}$.

Our base case is met, so let us consider a step. Suppose that our $t + 1$-st observation is a cooperation, then we have:

\begin{aligned} m_{t + 1,k} & = \frac{ m_{t,k+1} }{ m_{t,1} } \\ & = \frac{ c + d + 2 }{c + 1} \frac{(c + 1)(c + 2)...(c + k + 1)}{(c + d + 2)(c + d + 3)...(c + d + k + 2)} \\ & = \frac{(c + 2)...(c + k + 1)}{(c + d + 3)...(c + d + k + 2)} \\ & = \frac{((c + 1) + 1)((c + 1) + 2)...((c + 1) + k)}{((c + 1) + d + 2)((c + 1) + d + 3)...((c + 1) + d + k + 1)} \end{aligned}.

Where the last line is exactly what we expect: observing a cooperation at step $t+1$ means we have seen a total of $c + 1$ cooperations.

If we observe a defection on step $t + 1$, instead, then we have:

\begin{aligned} m_{t + 1,k} & = \frac{m_{t,k} - m_{t,k+1} }{1 - m_{t,1} } \\ & = \frac{ c + d + 2 }{d + 1} m_{t,k}( 1 - \frac{c + k + 1}{c + d + k + 2}) \\ & = \frac{ c + d + 2 }{d + 1} m_{t,k} \frac{d + 1}{c + d + k + 2} \\ & = \frac{(c + 1)(c + 2)...(c + k)}{(c + d + 3)...(c + d + k + 1)(c + d + k + 2)} \\ & = \frac{(c + 1)(c + 2)...(c + k)}{(c + (d + 1) + 2)(c + (d + 1) + 3)...(c + (d + 1) + k + 1)} \end{aligned}

Which is also exactly what we expect: observing a defection at step $t+1$ means we have seen a total of $d + 1$ defections. This completes our proof by induction, and means that our agents need to only store the number of cooperations and defections they have experienced.

I suspect the above theorem is taught in any first statistics course, unfortunately I’ve never had a stats class so I had to recreate the theorem here. If you know the name of this result then please leave it in the comments. For those that haven’t seen this before, I think it is nice to see explicitly how rationally estimating probabilities based on past data reduces to counting that data.

Our agents are then described by two numbers giving their genotype, and four for their mind. For the genotype, there is the values of $U$ and $V$ that mean that the agent thinks it is playing the following cooperate-defect game:

$\begin{pmatrix} 1 & U \\ V & 0 \end{pmatrix}$

For the agents’ mind, we have $n_{CC}, n_{CD}$ which is the number of cooperations and defections the agents saw after cooperation, and $n_{DC}, n_{DD}$ is the same following a defection. From these values and the theorem we just proved, the agent knows that $p = \frac{n_{CC} + 1}{n_{CC} + n_{CD} + 2}$ and $q = \frac{n_{DC} + 1}{n_{DC} + n_{DD} + 2}$. With these values, the agent can calculate the expected subjective utility of cooperating and defecting:

\begin{aligned} \text{Util}(C) & = p + (1 - p)U \\ \text{Util}(D) & = qV \end{aligned}

If $\text{Util}(C) > \text{Util}(D)$ then the agent will cooperate, otherwise — defect. This has a risk of locking an agent into one action forever, say cooperate, and then never having a chance to sample results for defection and thus never update $q$. To avoid this, we use the trembling-hand mechanism (or $\epsilon$-greedy reinforcement learning): with small probability $\epsilon$ the agent performs the opposite action of what it intended.

The above agent is rational with respect to its subjective state $U,V,p,q$ but could be acting irrationally with respect to the objective game $\begin{pmatrix}1 & X \\ Y & 0 \end{pmatrix}$ and proportion of cooperation $r$.

## Objective and subjective rationality

My colleagues and I share a strong interest in combining learning, development, and evolution. For me, the particular interest is how evolution can build better learners. However, one of the assumptions I often see made implicitly is that the learner understands how game payoffs effect his fitness. In particular, when learning it is usually assumed that the learning feedback signal of “am I doing well?” correlates perfectly with the evolutionary fitness of “am I doing well?” (at least when ignoring inclusive-fitness effects). Marcel Montrey and I decided to question this assumption.

More formally, in the context of evolutionary game theory, there is usually some symmetric game G that is being played by pairs of agents (we will stick to two-player symmetric games for now, but the generalization to non-symmetric and multiplayer games is obvious). When two agents interact, they have some procedure (which could adapt over their lifetime through learning) for picking which strategy they are going to play. I will refer to these strategies by numbers (1, 2, 3, …, n for an n-strategy game) but you could equally well chose a different naming scheme. If Alice chooses strategy i and Bob chooses strategy j, then Alice’s fitness is changed by an amount $G[i,j]$ and Bob’s fitness is changed by an amount $G[j,i]$. The agents will have a chance of reproduction that is proportional to their fitness. In the orthodox learning setting, Alice will also know that she chose i, and have access to her fitness change $G[i,j]$ and maybe even Bob’s choice j. This information allows her to learn and update her procedure for picking her strategy for the next interaction.

My qualm is the fact that Alice is aware of her fitness change $G[i,j]$ and can use that to guide learning. It is not at all obvious to me that a learner would have such information. For instance, if I go to gym and work out, I feel pain in my muscles which can be seen as a feedback signal saying “don’t do this again, you are hurting yourself and reducing fitness!” However, I am actually increasing my fitness through exercise, and thankfully I have evolved another mechanism that releases dopamine and makes me feel happy about the exercise, giving me a signal “do this again, it is increasing fitness!”. However, this feedback mechanism is under the influence of evolution and there is no a priori reason to believe that this perceived fitness change will correlate with the actual effect on my fitness.

This can be stated more precisely in evolutionary game theory. There is a global ‘real’ game G as described before, however Alice also has her own internal conception of the game $H_A$ and Bob might have a different conception $H_B$. When Alice uses strategy i, and Bob uses strategy j, their fitness is changed according to the real game. Alice’s fitness is changed by $G[i,j]$ and Bob’s by $G[j,i]$. However, their learning algorithm does not know the ‘real’ game and gets no feedback from it at all. Alice knows she did strategy i, and she feels like she received a fitness payoff $H_A[i,j]$, while Bob feels he received a fitness payoff $H_B[j,i]$. However, $H_A$ and $H_B$ might be very different from each other and/or from G. Because of this, even if Alice and Bob have the same learning algorithm, they might behave differently because they will be getting a different feedback signal even if they perform the same action. Therefore, evolution can act on these internal conceptions $H_A$ and $H_B$.

We will fix a learning/inference/production rule for agents, and allow the internal conception to evolve. If the production rule is pure rationality based on your internal conception of the game, then we recover standard evolutionary game theory, and the agents’ genotype can just be their strategy (or at worst a fixed probability distribution over strategies); this doesn’t look at learning. To look at learning Marcel and I decided to use the simplest rational learning procedure: Bayes’ rule. Our agents will use Bayes’ rule to update their expected utility for actions based on observations of previous outcomes according to their internal conception of the game. They will use the strategy that has the highest expected utility. Thus, the agents are rational based on their internal representation of the game: we call this subjective rationality.

To allow the agents to explore different strategies during their lifetime (and not lock into one strategy) we will use a standard technique from economics: the shaky hand. If Alice wants to perform action i then she will try to and succeed with probability $1 - \epsilon$. With probability $\epsilon$ she’ll select one of the n strategies strategy uniformly at random. Of course, to make this shakiness meaningful, we have to select $\epsilon$ high enough (or the expected life-spans of agents have to be long enough) to make sure that with high probability they try each of the n strategies by accident during their lifetime.

For an inviscid (well-mixed) population, I expect agents that have internal conceptions that are qualitatively similar to G will fare better. The population as a whole will converge towards internal conceptions that is consistent with the ‘real’ world, and thus behave with objective rationality. This is a boring control, and the interesting case is structured populations. In that case, I predict that agents will evolve internal conceptions that are not necessarily similar to G. Their conceptions will indirectly take inclusive-fitness effects into account. This will allow for the emergence of objectively irrational behavior, even though the agents learning rule is subjectively rational. Specifically, for games on k-regular random graphs, I predict that the internal conceptions will converge towards the Ohtsuki-Nowak transform of G. What does that mean? In future posts Marcel and I will introduce random k-regular graphs and the Ohtsuki-Nowak transform and make this prediction more precise.