# Rationality, the Bayesian mind and their limits

Bayesianism is one of the more popular frameworks in cognitive science. Alongside other similar probalistic models of cognition, it is highly encouraged in the cognitive sciences (Chater, Tenenbaum, & Yuille, 2006). To summarize Bayesianism far too succinctly: it views the human mind as full of beliefs that we view as true with some subjective probability. We then act on these beliefs to maximize expected return (or maybe just satisfice) and update the beliefs according to Bayes’ law. For a better overview, I would recommend the foundations work of Tom Griffiths (in particular, see Griffiths & Yuille, 2008; Perfors et al., 2011).

This use of Bayes’ law has lead to a widespread association of Bayesianism with rationality, especially across the internet in places like LessWrong — Kat Soja has written a good overview of Bayesianism there. I’ve already written a number of posts about the dangers of fetishizing rationality and some approaches to addressing them; including bounded rationality, Baldwin effect, and interface theory. I some of these, I’ve touched on Bayesianism. I’ve also written about how to design Baysian agents for simulations in cognitive science and evolutionary game theory, and even connected it to quasi-magical thinking and Hofstadter’s superrationality for Kaznatcheev, Montrey & Shultz (2010; see also Masel, 2007).

But I haven’t written about Bayesianism itself.

In this post, I want to focus on some of the challenges faced by Bayesianism and the associated view of rationality. And maybe point to some approach to resolving them. This is based in part of three old questions from the Cognitive Sciences StackExhange: What are some of the drawbacks to probabilistic models of cognition?; What tasks does Bayesian decision-making model poorly?; and What are popular rationalist responses to Tversky & Shafir?

Let’s start with Tversky & Shafir. The scientists are probably best known today from their prominent roles in Kahneman’s (2011) Thinking: Fast and Slow. The three were colleagues and made many important contributions to psychology — I’d highly recommend reading Kahneman’s book for a broad overview. I’ll focus on the early 90s when Tversky & Shafir observed several violations of rationality in human participants. In particular, they noted violation of the disjunction effect and sure-thing principle (for examples, see Shafir & Tversky, 1992; Tversky & Shafir, 1992).

An example of the violation they saw was in the Prisoners’ dilemma (Shafir & Tversky, 1992): if a person knew their partner defected then also defected (only a 3% cooperation rate), if a person knew their partner cooperated then they usually still defected (only a 16% cooperation rate). However, if they were not sure if their partner defected or cooperated, then they cooperated at much higher rates (a 37% cooperation rate). This violates the naive rationalist expectation of some % between 3 and 16 in the unknown-condition case. This is called a violation of the sure-thing principle.

More formally, given a random variable that can have only two possible outcomes A and B, probability requires $p(X)$ to be between $p(X|A)$ and $p(X|B)$. A violation is when $p(X) > p(X|A)$ and $p(X) > p(X|B)$ (or both $<$ instead). In addition to Shafir & Tversky (1992) example with Prisoner’s dilema, violations of the sure thing principle have also been shown by Tversky & Shafir (1992) in a two-stage gambling task; Townsend et al. (2000) in a face-categorization task; and others.

Shafir & Tversky (1972) explained this effect through quasi-magical thinking. Even though the participants knew they had no causal effect on their partner’s choice, when the choice was unknown they still “did their part” to cause a favorable outcome.

Note that given the false-belief that your actions magically effect the outcome of the other participant’s decision, it is no longer irrational to cooperate at a higher rate when you don’t know the partner’s decision.

Tversky & Kahneman (1983) described Linda and then asked the participant to make a probability judgement $p(\text{BT})$ of Linda being a bank-teller, or a probability judgement $p(\text{BT} \;\&\; \text{F})$ of Linda being a bank-teller and a feminist. In any Bayesian model (without some grafted mechanisms or weird latent variables) you need to have $p(\text{BT}) \geq p(\text{BT} \;\&\; \text{F})$, but the participants judged $p(\text{BT}) < p(\text{BT} \;\&\; \text{F})$ and committed a conjunction fallacy.

Tversky & Kahneman (1983)’s ad-hoc explanation for this was the <representative heuristic. But Gavanski & Roskos-Ewoldsen (1991) recreate the fallacy in a setting where they showed that this ad-hoc heuristic is not sufficient. Alternatively, it is natural to suspect that Tversky & Kahneman (1983)'s result could be an artifact of participants not understanding the modern concept of probability. But Sides et al. (2002) account for this by using a betting paradigm that uses probabilities implicitly instead of asking participants to report numeric values. They showed that the conjunction fallacy is independent of numeric probability reporting and thus an intrinsic 'error'.

The above back and forth of ad-hoc fixes via heristics/biases is a particularly important point. It shows how unconstrained Bayesian models can run the risk of being just-so stories in the sense of Bowers & Davies (2012).

For a final highlight: when conducting a questionnaire, the order questions are asked in changes the resulting probability judgements. For a purely Bayesian approach, the mutual probability of asking A then B and getting a specific outcome is $p(A)p(B|A) = p(A \& B) = p(B \& A) = p(B)p(A|B)$. This would suggest that the order questions are asked shouldn't matter. But Feldman & Lynch (1988), Schuman & Presser (1996), Moore (2002), and countless pollsters have shows that order of questions matters. Hence, we have a failure of commutativity.

This order effect is not confined to questions. It also applies to integrating evidence. The strongest point of Bayesianism is a clear theory of how to update hypotheses, given evidence — i.e. Bayes’ rule. Unfortunately for Bayes’ rule $p(H|A \& B) = p(H|B \& A)$ but Shanteau (1970) and Hogarth & Einhorn (1992) showed that for humans this is not always the case. Unsurprisingly, they present ad-hoc heuristic-and-biases alternatives to explain this.

More importantly, these order effects are not confined to the artificial setting of the university lab. They can also be found in the wild. Non-commutativity has been seen in natural settings including clinical diagnoses (Berges et al., 1998) and dispute mediation by a jury (McKenzie, Lee, & Chen, 2002).

If all of these standard violations of probability aren’t enough, even more exotic violations are possible. For example, Aerts & Sozzo (2011) studied membership judgements for pairs of concept combinations. They found that among their participants there were dependences between concept pairs that violated Bell’s theorem. Thus, this data could not be fit by any reasonable classical joint distribution over the concept combinations.

Of course, this list is not exhaustive or comprehensive. I also don’t know how robust each individual study is. But seeing these many anecdotes does make me wonder why people are so inclined to take Bayesian models as the obvious choice. In the end, it is probably just that all models are wrong but some are useful. We can use Bayesian models as a starting point and build from there. We just have to make sure that our building doesn’t make arbitrary just-so stories.

### References

Aerts, D. & Sozzo, S. (2011) Quantum Structure in Cognition: Why and How Concepts Are Entangled. Quantum Interaction 7052: 116-127.

Berges, G.R., Chapman, G.B., Levy, B.T., Ely, J.W., & Oppliger, R.A. (1998) Clinical diagnosis and order information. Medical Decision Making 18: 412-417.

Bowers, J.S. & Davis, C.J. (2012). Bayesian just-so stories in psychology and neuroscience.. Psychological Bulletin, 138: 389.

Chater, N., Tenenbaum, J. B., & Yuille, A. (2006). Probabilistic models of cognition: Conceptual foundations. Trends in Cognitive Science, 10(7): 287-291.

Feldman, J.M. & Lynch, J.G. (1988) Self-generated validity and other effects of measurement on belief, attitude, intention, and behavior. Journal of Applied Psychology 73: 421-435.

Gavanski, I., & Roskos-Ewoldsen, D.R. (1991) Representativeness and conjoint probability. Journal of Personality and Social Psychology 61(2): 181-194.

Griffiths, T. L., & Yuille, A. (2008). A primer on probabilistic inference. In M. Oaksford and N. Chater (Eds.). The probabilistic mind: Prospects for rational models of cognition. Oxford: Oxford University Press.

Hogarth, R.M. & Einhorn, H.J. (1992) Order effects in belief updating: the belief-adjustment model. Cognitive Psychology 24: 1-55.

Kahneman, D. (2011). Thinking: fast and slow. Macmillan.

Kaznatcheev, A., Montrey, M., & Shultz, T.R. (2014). Evolving useful delusions: Subjectively rational selfishness leads to objectively irrational cooperation. Proceedings of the 36th annual conference of the cognitive science society. arXiv: 1405.0041v1.

Masel, J. (2007). A Bayesian model of quasi-magical thinking can explain observed cooperation in the public good game. Journal of Economic Behavior & Organization, 64(2): 216-231.

McKenzie, C.R.M., Lee, S.M., & Chen, K.K. (2002) When negative evidence increases confidence: change in belief after hearing two sides of a dispute. Journal of Behavioral Decision Making 15: 1-18.

Moore, D.W. (2002) Measuring new types of question-order effects. Public Opinion Quarterly 66: 80-91

Perfors, A., Tenenbaum, J.B., Griffiths, T. L., & Xu, F. (2011). A tutorial introduction to Bayesian models of cognitive development. Cognition, 120, 302-321.

Schuman, H., & Presser, S. (1996). Questions and answers in attitude surveys: Experiments on question form, wording, and context. Sage.

Shafir, E., & Tversky, A. (1992). Thinking through uncertainty: Nonconsequential reasoning and choice. Cognitive Psychology, 24: 449-474.

Shanteau, J.C. (1970) An additivity model for sequential decision making. Journal of Experimental Psychology. 85: 181-191.

Sides, A., Osherson, D., Bonini, N., and Viale, R. (2002) On the reality of the conjunction fallacy. Memory & Cognition 30(2): 191-198.

Townsend, J.T., Silva, K.M., Spencer-Smith, J., & Wenger, M. (2000) Exploring the relations between categorization and decision making with regard to realistic face stimuli. Pragmatics and Cognition 8: 83-105.

Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: the conjuctive fallacy in probability judgement. Psychology Review 101: 547-567.

Tversky, A., & Shafir, E. (1992). The disjunction effect in choice under uncertainty. Psychological Science, 3:, 305-309.