Cross-validation in finance, psychology, and political science

A large chunk of machine learning (although not all of it) is concerned with predictive modeling, usually in the form of designing an algorithm that takes in some data set and returns an algorithm (or sometimes, a description of an algorithm) for making predictions based on future data. In terminology more friendly to the philosophy of science, we may say that we are defining a rule of induction that will tell us how to turn past observations into a hypothesis for making future predictions. Of course, Hume tells us that if we are completely skeptical then there is no justification for induction — in machine learning we usually know this as a no-free lunch theorem. However, we still use induction all the time, usually with some confidence because we assume that the world has regularities that we can extract. Unfortunately, this just shifts the problem since there are countless possible regularities and we have to identify ‘the right one’.

Thankfully, this restatement of the problem is more approachable if we assume that our data set did not conspire against us. That being said, every data-set, no matter how ‘typical’ has some idiosyncrasies, and if we tune in to these instead of ‘true’ regularity then we say we are over-fitting. Being aware of and circumventing over-fitting is usually one of the first lessons of an introductory machine learning course. The general technique we learn is cross-validation or out-of-sample validation. One round of cross-validation consists of randomly partitioning your data into a training and validating set then running our induction algorithm on the training data set to generate a hypothesis algorithm which we test on the validating set. A ‘good’ machine learning algorithm (or rule for induction) is one where the performance in-sample (on the training set) is about the same as out-of-sample (on the validating set), and both performances are better than chance. The technique is so foundational that the only reliable way to earn zero on a machine learning assignments is by not doing cross-validation of your predictive models. The technique is so ubiquotes in machine learning and statistics that the StackExchange dedicated to statistics is named CrossValidated. The technique is so…

You get the point.

If you are a regular reader, you can probably induce from past post to guess that my point is not to write an introductory lecture on cross validation. Instead, I wanted to highlight some cases in science and society when cross validation isn’t used, when it needn’t be used, and maybe even when it shouldn’t be used.
Read more of this post

Big data, prediction, and scientism in the social sciences

Much of my undergrad was spent studying physics, and although I still think that a physics background is great for a theorists in any field, there are some downsides. For example, I used to make jokes like: “soft isn’t the opposite of hard sciences, easy is.” Thankfully, over the years I have started to slowly grow out of these condescending views. Of course, apart from amusing anecdotes, my past bigotry would be of little importance if it wasn’t shared by a surprising number of grown physicists. For example, Sabine Hossenfelder — an assistant professor of physics in Frankfurt — writes in a recent post:

If you need some help with the math, let me know, but that should be enough to get you started! Huh? No, I don't need to read your thesis, I can imagine roughly what it says.It isn’t so surprising that social scientists themselves are unhappy because the boat of inadequate skills is sinking in the data sea and physics envy won’t keep it afloat. More interesting than the paddling social scientists is the public opposition to the idea that the behavior of social systems can be modeled, understood, and predicted.

As a blogger I understand that we can sometimes be overly bold and confrontational. As an informal medium, I have no fundamental problem with such strong statements or even straw-men if they are part of a productive discussion or critique. If there is no useful discussion, I would normally just make a small comment or ignore the post completely, but this time I decided to focus on Hossenfelder’s post because it highlights a common symptom of interdisciplinitis: an outsider thinking that they are addressing people’s critique — usually by restating an obvious and irrelevant argument — while completely missing the point. Also, her comments serve as a nice bow to tie together some thoughts that I’ve been wanting to write about recently.
Read more of this post

Kleene’s variant of the Church-Turing thesis

KleeneIn 1936, Alonzo Church, Alan Turing, and Emil Post each published independent papers on the Entscheidungsproblem and introducing the lambda calculus, Turing machines, and Post-Turing machines as mathematical models of computation. A myriad of other models followed, many of them taking seemingly unrelated approaches to the computable: algebraic, combinatorial, linguistic, logical, mechanistic, etc. Of course, all of these models were shown to be equivalent in what they could compute and this great heuristic coherence lead mathematicians to formulate the Church-Turing thesis. As with many important philosophical notions, over the last three-quarters of a century, the thesis has gradually changed. In a semi-historic style, I will identify three progressively more empirical formulations with Kleene, Post, and Gandy. For this article, I will focus on the purely mathematical formulation by Kleene, and reserve the psychological and physical variants for next time.

Mathematicians and logicians begat the Church-Turing thesis, so at its inception it was a hypothesis about the Platonic world of mathematical ideas and not about the natural world. There are those that follow Russell (and to some extent Hilbert) and identify mathematics with tautologies. This view is not typically held among mathematicians, who following in the footsteps of Godel know how important it is to distinguish between the true and the provable. Here I side with Lakatos in viewing logic and formal systems as tools to verify and convince others about our intuitions of the mathematical world. Due to Godel’s incompleteness theorems and decades of subsequent results, we know that no single formal system will be a perfect lens on the world of mathematics, but we do have prefered one like ZFC.
Read more of this post

Misunderstanding falsifiability as a power philosophy of Scientism

KarlPopperI think that trying to find one slogan that captures all of science and nothing else is a fool’s errand. However, it is an appealing errand given our propensity to want to classify and delimit the things we care about. It is also an errand that often takes a central role in the philosophy of science.

Just like with almost any modern thought, if we try hard enough then we can trace philosophy of science back to the Greeks and discuss the contrasting views of Plato and Aristotle. As fun as such historical excursions might be, it seems a little silly given that the term scientist was not coined until 1833 and even under different names our current conception of scientists would not stretch much further back than the natural philosophers of the 17th century. Even the early empiricism of these philosophers, although essential as a backdrop and a foundation shift in view, is more of an overall metaphysical outlook than a dedicate philosophy of science.
Read more of this post

Algorithmic Darwinism

The workshop on computational theories of evolution started off on Monday, March 17th with Leslie Valiant — one of the organizers — introducing his model of evolvability (Valiant, 2009). This original name was meant to capture what type of complexity can be achieved through evolution. Unfortunately — especially at this workshop — evolvability already had a different, more popular meaning in biology: mechanisms that make an organism or species ‘better’ at evolving, in the sense of higher mutations rates, de novo genes, recombination through sex, etc. As such, we need a better name and I am happy to take on the renaming task.
Read more of this post

Why academics should blog and an update on readership

It’s that time again, TheEGG has passed a milestone — 150 posts under our belt!– and so I feel obliged to reflect on blogging plus update the curious on the readerships statistics.

About a month ago, Nicholas Kristof bemoaned the lack of public intellectuals in the New York Times. Some people responded with defenses of the ‘busy academic’, and others agreement but with a shift of conversation medium to blogs from the more traditional media Kristof was focused on. As a fellow blogger, I can’t help but support this shift, but I also can’t help but notice the conflation of two very different notions: the public intellectual and the public educator.
Read more of this post

Computational theories of evolution

If you look at your typical computer science department’s faculty list, you will notice the theorists are a minority. Sometimes they are further subdivided by being culled off into mathematics departments. As such, any institute that unites and strengthens theorists is a good development. That was my first reason for excitement two years ago when I learned that a $60 million grant would establish the Simons Institute for the Theory of Computing at UC, Berkeley. The institute’s mission is close to my heart: bringing the study of theoretical computer science to bear on the natural sciences; an institute for the algorithmic lens. My second reason for excitement was that one of the inaugural programs is evolutionary biology and the theory of computing. Throughout this term, a series workshops are being held to gather and share the relevant experience.

Right now, I have my conference straw hat on, as I wait for a flight transfer in Dallas on my way to one of the events in this program, the workshop on computational theories of evolution. For the next week I will be in Berkeley absorbing all there is to know on the topic. Given how much I enjoyed Princeton’s workshop on natural algorithms in the sciences, I can barely contain my excitement.
Read more of this post

From heuristics to abductions in mathematical oncology

As Philip Gerlee pointed out, mathematical oncologists has contributed two main focuses to cancer research. In following Nowell (1976), they’ve stressed the importance of viewing cancer progression as an evolutionary process, and — of less clear-cut origin — recognizing the heterogeneity of tumours. Hence, it would seem appropriate that mathematical oncologists might enjoy Feyerabend’s philosophy:

[S]cience is a complex and heterogeneous historical process which contains vague and incoherent anticipations of future ideologies side by side with highly sophisticated theoretical systems and ancient and petrified forms of thought. Some of its elements are available in the form of neatly written statements while others are submerged and become known only by contrast, by comparison with new and unusual views.

If you are a total troll or pronounced pessimist you might view this as even leading credence to some anti-scientism views of science as a cancer of society. This is not my reading.

For me, the important takeaway from Feyerabend is that there is no single scientific method or overarching theory underlying science. Science is a collection of various tribes and cultures, with their own methods, theories, and ontologies. Many of these theories are incommensurable.
Read more of this post

Misleading models in mathematical oncology

I have an awkward relationship with mathematical oncology, mostly because oncology has an awkward relationship with math. Although I was vaguely familiar that evolutionary game theory (EGT) could be used in cancer research, mostly through Axelrod et al. (2006), I never planned to work on cancer. I wasn’t eager to enter the field because I couldn’t see how heuristic models could be of use in medicine; I thought only insilications could be useful, but EGT was not at a level of sophistication where it could build predictive models. I worried that selling non-predictive models as advice for treatment would only cause harm. However, the internet being the place it is, I ended up running into David Basanta — one of the major advocates of EGT in oncology — and Jacob Scott on twitter. After looking through some of the literature, I realized that most of experimental cancer research was more piecemeal than I expected and theory was based mostly on ad-hoc mental models. This convinced me that there is room for clear mathematical (and maybe computational) reasoning to help formalize and explore these mental models. Now we have a paper applying the Ohtsuki-Nowak transform to studying edge effects in the go-grow game prepped (Kaznatcheev, Scott, & Basanta, 2013), and David and I have a project on chronic myeloid leukemia in the works. The first is a heuristic model building on top of previously developed tools (from my experience, it is rather uncommon to build directly on others’ work in evolutionary game theory and mathematical oncology) and the other an abductive model using a combination of analytic and machine learning techniques to produce a predictive tool useful in the clinic.
Read more of this post

Approximating spatial structure with the Ohtsuki-Nowak transform

Can we describe reality? As a general philosophical question, I could spend all day discussing it and never arrive at a reasonable answer. However, if we restrict to the sort of models used in theoretical biology, especially to the heuristic models that dominate the field, then I think it is relatively reasonable to conclude that no, we cannot describe reality. We have to admit our current limits and rely on thinking of our errors in the dual notions of assumptions or approximations. I usually prefer the former and try to describe models in terms of the assumptions that if met would make them perfect (or at least good) descriptions. This view has seemed clearer and more elegant than vague talk of approximations. It is the language I used to describe the Ohtsuki-Nowak (2006) transform over a year ago. In the months since, however, I’ve started to realize that the assumptions-view is actually incompatible with much of my philosophy of modeling. To contrast my previous exposition (and to help me write up some reviewer responses), I want to go through a justification of the ON-transform as a first-order approximation of spatial structure.
Read more of this post


Get every new post delivered to your Inbox.

Join 2,084 other followers