January 28, 2013
by Artem Kaznatcheev

One of the requirements of our objective versus subjective rationality model is to have learning agents that act rationally on their subjective representation of the world. The easiest parameter to consider learning is the probability of agents around you cooperating or defecting. In an one shot game without memory, your partner cannot condition their strategy on your current (or previous actions) directly. However, we don’t want to build knowledge of this into the agents, so we will allow them to learn the conditional probabilities of seeing a cooperation if they cooperate, and of seeing a cooperation if they defect. If the agents learning accurately reflects the world then we will have .

For now, let us consider learning , the other case will be analogous. In order to be rational, we will require the agent to use Bayesian inference. The hypotheses will be for — meaning that the partner has a probability of cooperation. The agent’s mind is then some probability distribution over , with the expected value of being . Let us look at how the moments of change with observations.

Suppose we have some initial distribution , with moments . If we know the moments up to step then how will they behave at the next time step? Assume the partner cooperated:

If the partner defected:

Although tracking moments is easier than updating the whole distribution and sufficient for recovering the quantity of interest ( — average probability of cooperation over ), it can be further simplified. If is the uniform distribution, then . What are the moments doing at later times? They’re just counting, which we will prove by induction.

Our inductive hypothesis is that after observation, with of them being cooperation (and thus defection), we have:

.

Note that this hypothesis implies that

.

If we look at the base case of (and thus ) then this simplifies to

.

Our base case is met, so let us consider a step. Suppose that our -st observation is a cooperation, then we have:

.

Where the last line is exactly what we expect: observing a cooperation at step means we have seen a total of cooperations.

If we observe a defection on step , instead, then we have:

Which is also exactly what we expect: observing a defection at step means we have seen a total of defections. This completes our proof by induction, and means that our agents need to only store the number of cooperations and defections they have experienced.

I suspect the above theorem is taught in any first statistics course, unfortunately I’ve never had a stats class so I had to recreate the theorem here. If you know the name of this result then please leave it in the comments. For those that haven’t seen this before, I think it is nice to see explicitly how rationally estimating probabilities based on past data reduces to counting that data.

Our agents are then described by two numbers giving their genotype, and four for their mind. For the genotype, there is the values of and that mean that the agent thinks it is playing the following cooperate-defect game:

For the agents’ mind, we have which is the number of cooperations and defections the agents saw after cooperation, and is the same following a defection. From these values and the theorem we just proved, the agent knows that and . With these values, the agent can calculate the expected subjective utility of cooperating and defecting:

If then the agent will cooperate, otherwise — defect. This has a risk of locking an agent into one action forever, say cooperate, and then never having a chance to sample results for defection and thus never update . To avoid this, we use the trembling-hand mechanism (or -greedy reinforcement learning): with small probability the agent performs the opposite action of what it intended.

The above agent is rational with respect to its subjective state but could be acting irrationally with respect to the objective game and proportion of cooperation .

## Emotional contagion and rational argument in philosophical texts

November 5, 2015 by Artem Kaznatcheev 1 Comment

Last week I returned to blogging with some reflections on reading and the written word more generally. Originally, I was aiming to write a response to Roger Schank’s stance that “reading is no way to learn”, but I wandered off on too many tangents for an a single post or for a coherent argument. The tangent that I left for this post is the role of emotion and personality in philosophical texts.

In my last entry, I focused on the medium independent aspects of Schank’s argument, and identified two dimensions along which a piece of media and our engagement with it can vary: (1) passive consumption versus active participation, and (2) the level of personalization. The first continuum has a clearly better end on the side of more active engagement. If we are comparing mediums then we should prefer ones that foster more active engagement from the participants. The second dimension is more ambiguous: sometimes a more general piece of media is better than a bespoke piece. What is better becomes particularly ambiguous when being forced to adapt a general approach to your special circumstances encourages more active engagement.

In this post, I will shift focus from comparing mediums to a particular aspect of text and arguments: emotional engagement. Of course, this also shows up in other mediums, but my goal this time is not to argue across mediums.

Read more of this post

Filed under Commentary, Personal Tagged with compassion, ethics and morality, learning, rationality