Limits of prediction: stochasticity, chaos, and computation
September 30, 2014 2 Comments
Some of my favorite conversations are about prediction and its limits. For some, this is purely a practical topic, but for me it is a deeply philosophical discussion. Understanding the limits of prediction can inform the philosophies of science and mind, and even questions of free-will. As such, I wanted to share with you a World Science Festival video that THEREALDLB recently posted on /r/math. This is a selected five minute clip called “What Can’t We Predict With Math?” from a longer one and a half hour discussion called “Your Life By The Numbers: ‘Go Figure'” between Steven Strogatz, Seth Lloyd, Andrew Lo, and James Fowler. My post can be read without watching the panel discussion or even the clip, but watching the clip does make my writing slightly less incoherent.
I want to give you a summary of the clip that focuses on some specific points, bring in some of discussions from elsewhere in the panel, and add some of my commentary. My intention is to be relevant to metamodeling and the philosophy of science, but I will touch on the philosophy of mind and free-will in the last two paragraphs. This is not meant as a comprehensive overview of the limits of prediction, but just some points to get you as excited as I am about this conversation.
Two types of prediction
In the clip, Steven Strogatz starts us off by distinguishing between two types of predictions: correlational and mechanistic. He presents these as two very different kinds, although elsewhere in the panel James Fowler provides a great example of how we can extract mechanism — causation, in particular — by exploring asymmetries in our observed data. I think it is instructive to take a short detour into Fowler’s example.
Fowler wants to distinguish between two reasons why we find correlations between individuals of the sort that Strogatz mentions (“if your friend is chubby, I can predict that you are chubby”). One mechanism might be that your friends are influencing you to be chubby (or to quit drinking, or like the Ramones) by being chubby themselves. Another mechanism is that you choose to associate with people who are like you, so it isn’t that your friends caused you to be chubby but them both of you being chubby caused you to pick them as friends. From static social network data, it is not completely obvious how one would distinguish between these two cases. Fowler presents an insightful approach: look at asymmetry in friendship. If I declare you as my friend, but you don’t declare me as a friend then it is reasonable to assume that I pay more attention to you than you do to me. As such, your chubbiness would influence me via mechanism one, but not vice versa. From the perspective of mechanism two, however, I chose you as a friend, so my chubbiness affects that I might have chosen you as a chubby buddy. Thus, by looking at which direction has stronger correlation, we can start to distinguish between the two mechanisms and start to establish some sort of causation. Fowler’s example is abduction at its finest. We are taking some guess that is independent of the empirical data (i.e. our assumption about how asymmetric friendships structure information flow) and projecting it onto the data to extract some kind of understanding.
This understanding is they key word that Andrew Lo adds right after Strogatz in the clip. I think the key difference between Strogatz’s two categories is not the static versus dynamic nature that he stresses, but the presence of understanding: extracting or guessing an underlying theory of why something happens. I’ve discussed this distinction in more detail in my post on machine learning and prediction without understanding.
Stochasticity and certainty
Regardless of which type of prediction we are interested in, though, there are obstacles we need to overcome. These obstacles are the primary focus on the short clip. I want to start with Fowler’s obstacle: stochasticity. Fowler focuses on stochasticity from ignorance or lack of measurement, although elsewhere in the talk, Seth Lloyd also considers inherent stochasticity due to quantum mechanics — the difference between the two is not particularly important for the practical concerns of prediction.
In the case of physics or chemistry, stochasticity is seldom a problem. In fact, for statistical mechanics, assuming randomness actually helps us make better predictions. In the social sciences, however, we can’t hope for five sigma results — there are simply too few data points. Alternatively, we might not care about the average results or reliability of long term trends, but need to know a specific outcome: will my favorite candidate win this election? Will there be a military coup in this country during this year? Such specific questions can be extremely important to a lot of people, and a stochastic prediction is of little use. Navigating this chasm between stochastic models and public (or personal) demand for certainty is especially difficult in the social sciences as I’ve outlined in my post on models, modesty, and moral methodology.
Chaos and the prediction horizon
Personally, I find that most of the discussions on the limits of prediction to be unreasonably focused on stochasticity. In reality, we do not need our process to be inherently stochastic in order to throw a wrench in our predictions. The classic example is the weather or the solar system over long time periods. Here, the culprit is chaos and during the panel (although not so much in the clip) Strogatz focuses heavily on this trouble-maker. Even in a deterministic system, nonlinear dynamics can take a small amount of uncertainty in measurement (say due to small quantum fluctuations in the ideal case, or just poor sensors in the practical case) and amplify them exponentially, quickly reducing our confidence in the specifics of the system’s configuration. Thus, as we try to look further into the future, the quality of prediction becomes degrade to the point that we can no longer act on those predictions in a meaningful way. This point defines the predictive horizon.
It is important to notice that thinking about chaos has snuck in a little bit of pragmatism. A useful way to look at the state of a physical system is as a point in phase space (usually the dimensions of this space are given by the position and momentum of all the moving parts). Of course, due to the uncertainty of our measurement of the initial position, in practice we represent the system not as a point but as a little ball in phase space. For many (models of) physical systems (I won’t go into the technical details), the volume of this ‘ball’ of uncertainty remains constant even if the system is chaotic. But although the volume of uncertainty is constant, the shape changes from a ball to something that starts to send tendrils all over phase-space. Thus, our overall uncertainty doesn’t increase, just how it is distributed across the properties we care about.
This also means, that we can still make many kinds of predictions about chaotic systems, as long as the property we care about interacts nicely with the system dynamics. For example, I can’t tell you if the temperature 3 days from now will be higher or lower than it is right now with too much certainty. But I can tell you with a lot of certainty that the weather three months from now (i.e. at the start of January) will be colder than right now (at least in New York, where I happen to be). Similarly, if I look at the famous Lorenz system, I might not be able to predict exactly where the particle is, but I will still be able to give you a rough description of the figure eight orbit it will be following.
For more discussion of chaos, and a closer connection to our next topic of computation, see my earlier post: Computer science on prediction and the edge of chaos.
Computation and self-reference
Now, even the barrier of chaos doesn’t feel completely deterministic. The reason we can’t predict the future is because of the amplification of an initial uncertainty. If there was absolute and complete precision (whatever that means, and however it violates Heisenberg) in the initial conditions, we would be able to make perfect future predictions. Just give me more data!
This brings me to Seth Lloyd’s point and the final barrier to prediction: computation and self-reference. Even if we imagine a perfectly deterministic world, with perfectly known discrete initial conditions (such as the setting of 0s and 1s on the tape of a Turing Machine) then there are certain predictions that can’t be made in the general case. The classic one is if the machine will halt or not, although Rice’s theorem gives us plenty of other problems. This limitation of prediction is not specific to Turing machines, but applies much more widely and rears its head in almost any setting where self-reference allows us to make a diagonalization argument (but, see this cstheory question for some related discussion).
Before I move on, I want to address three important points that often confuse people:
- Non-computability applies to general problems, not specific instances. It is a limitation on the general ways of knowing not on specific bits of knowledge. It cannot be properly modeled with probability and there is no secret way to sidestep it, as long as you subscribe to the Church-Turing thesis (although make sure to see point 3). If you are a physicalist this means something like: “the statistics of measurement for any repeatable physical process can be approximated arbitrarily well by a Turing machine”. Of course, you and the whole of humanity would be just one such physical process and so would be bound by the rule. If you don’t like physicalism, then even from a more idealist perspective, you could also restate the thesis as “[computability] is a property of our apparatus of perception … [and] capture what is thinkable by us”.
- Self-reference does not guarantee that uncomputable problems with bother you. The classic example is Google’s Page-Rank: to know the rank of a site we need to know the rank of all the sites linking to it (including itself potentially), but to know their rank we need to need to know the rank of sites linking to them (including the original site, potentially). However, we know that Google solves this question everyday, because there is a fixed point we can find using linear algebra. For the more philosophically minded, consider sentences like “this sentence refers to itself”, or “this sentence is true”. Neither of them is paradoxical, although both are self-referential.
- Hypercomputing doesn’t save you from this observation. Although a hypercomputer (whatever that might be) might allow you to solve the halting problem for normal computers, this hypercomputer will have a halting problem of its own and so you will still have unsolvable problems in this more powerful model of computation. You can just keep diagonalizing
The third point has an interesting caveat: if you believe yourself to be a hypercomputer and everything in the universe to be at most normal computers (or, more commonly: if you believe yourself to be capable of general computation, but the universe to be a finite-state machine) then you might be able to predict everything as long as you don’t try to predict yourself. During the panel, Lloyd related this sort of idea to free will. One can have a rational belief about their own free-will if the self-referential nature of trying to predict your own future actions (will I want an apple or a pear tomorrow morning?) is sufficient (in the sense of overcoming point 2) to create unpredictable problems for yourself. In such a setting, you will be justified in believing in your own free will because you cannot in general predict your own actions, even if you live in a perfectly deterministic universe where others can perfectly predict your actions by being outside your self-referential loop (by, for instance, not telling you their prediction).
Of course, this self-reference can be scaled to the level of societies, which is the original point raised by Lo to bring Lloyd to computability. We care about predictions made about us, and thus are part of a self-referential loop that might make certain kinds of predictions impossible. A classic problem in economics or marketing, where once a prediction, theory, or model becomes popular and known to a lot of people, it often stops working.