Objective and subjective rationality

My colleagues and I share a strong interest in combining learning, development, and evolution. For me, the particular interest is how evolution can build better learners. However, one of the assumptions I often see made implicitly is that the learner understands how game payoffs effect his fitness. In particular, when learning it is usually assumed that the learning feedback signal of “am I doing well?” correlates perfectly with the evolutionary fitness of “am I doing well?” (at least when ignoring inclusive-fitness effects). Marcel Montrey and I decided to question this assumption.

More formally, in the context of evolutionary game theory, there is usually some symmetric game G that is being played by pairs of agents (we will stick to two-player symmetric games for now, but the generalization to non-symmetric and multiplayer games is obvious). When two agents interact, they have some procedure (which could adapt over their lifetime through learning) for picking which strategy they are going to play. I will refer to these strategies by numbers (1, 2, 3, …, n for an n-strategy game) but you could equally well chose a different naming scheme. If Alice chooses strategy i and Bob chooses strategy j, then Alice’s fitness is changed by an amount G[i,j] and Bob’s fitness is changed by an amount G[j,i]. The agents will have a chance of reproduction that is proportional to their fitness. In the orthodox learning setting, Alice will also know that she chose i, and have access to her fitness change G[i,j] and maybe even Bob’s choice j. This information allows her to learn and update her procedure for picking her strategy for the next interaction.

My qualm is the fact that Alice is aware of her fitness change G[i,j] and can use that to guide learning. It is not at all obvious to me that a learner would have such information. For instance, if I go to gym and work out, I feel pain in my muscles which can be seen as a feedback signal saying “don’t do this again, you are hurting yourself and reducing fitness!” However, I am actually increasing my fitness through exercise, and thankfully I have evolved another mechanism that releases dopamine and makes me feel happy about the exercise, giving me a signal “do this again, it is increasing fitness!”. However, this feedback mechanism is under the influence of evolution and there is no a priori reason to believe that this perceived fitness change will correlate with the actual effect on my fitness.

This can be stated more precisely in evolutionary game theory. There is a global ‘real’ game G as described before, however Alice also has her own internal conception of the game H_A and Bob might have a different conception H_B. When Alice uses strategy i, and Bob uses strategy j, their fitness is changed according to the real game. Alice’s fitness is changed by G[i,j] and Bob’s by G[j,i]. However, their learning algorithm does not know the ‘real’ game and gets no feedback from it at all. Alice knows she did strategy i, and she feels like she received a fitness payoff H_A[i,j], while Bob feels he received a fitness payoff H_B[j,i]. However, H_A and H_B might be very different from each other and/or from G. Because of this, even if Alice and Bob have the same learning algorithm, they might behave differently because they will be getting a different feedback signal even if they perform the same action. Therefore, evolution can act on these internal conceptions H_A and H_B.

We will fix a learning/inference/production rule for agents, and allow the internal conception to evolve. If the production rule is pure rationality based on your internal conception of the game, then we recover standard evolutionary game theory, and the agents’ genotype can just be their strategy (or at worst a fixed probability distribution over strategies); this doesn’t look at learning. To look at learning Marcel and I decided to use the simplest rational learning procedure: Bayes’ rule. Our agents will use Bayes’ rule to update their expected utility for actions based on observations of previous outcomes according to their internal conception of the game. They will use the strategy that has the highest expected utility. Thus, the agents are rational based on their internal representation of the game: we call this subjective rationality.

To allow the agents to explore different strategies during their lifetime (and not lock into one strategy) we will use a standard technique from economics: the shaky hand. If Alice wants to perform action i then she will try to and succeed with probability 1 - \epsilon. With probability \epsilon she’ll select one of the n strategies strategy uniformly at random. Of course, to make this shakiness meaningful, we have to select \epsilon high enough (or the expected life-spans of agents have to be long enough) to make sure that with high probability they try each of the n strategies by accident during their lifetime.

For an inviscid (well-mixed) population, I expect agents that have internal conceptions that are qualitatively similar to G will fare better. The population as a whole will converge towards internal conceptions that is consistent with the ‘real’ world, and thus behave with objective rationality. This is a boring control, and the interesting case is structured populations. In that case, I predict that agents will evolve internal conceptions that are not necessarily similar to G. Their conceptions will indirectly take inclusive-fitness effects into account. This will allow for the emergence of objectively irrational behavior, even though the agents learning rule is subjectively rational. Specifically, for games on k-regular random graphs, I predict that the internal conceptions will converge towards the Ohtsuki-Nowak transform of G. What does that mean? In future posts Marcel and I will introduce random k-regular graphs and the Ohtsuki-Nowak transform and make this prediction more precise.