Allegory of the replication crisis in algorithmic trading

One of the most interesting ongoing problems in metascience right now is the replication crisis. This a methodological crisis around the difficulty of reproducing or replicating past studies. If we cannot repeat or recreate the results of a previous study then it casts doubt on if those ‘results’ were real or just artefacts of flawed methodology, bad statistics, or publication bias. If we view science as a collection of facts or empirical truths than this can shake the foundations of science.

The replication crisis is most often associated with psychology — a field that seems to be having the most active and self-reflective engagement with the replication crisis — but also extends to fields like general medicine (Ioannidis, 2005a,b; 2016), oncology (Begley & Ellis, 2012), marketing (Hunter, 2001), economics (Camerer et al., 2016), and even hydrology (Stagge et al., 2019).

When I last wrote about the replication crisis back in 2013, I asked what science can learn from the humanities: specifically, what we can learn from memorable characters and fanfiction. From this perspective, a lack of replication was not the disease but the symptom of the deeper malady of poor theoretical foundations. When theories, models, and experiments are individual isolated silos, there is no inherent drive to replicate because the knowledge is not directly cumulative. Instead of forcing replication, we should aim to unify theories, make them more precise and cumulative and thus create a setting where there is an inherent drive to replicate.

More importantly, in a field with well-developed theory and large deductive components, a study can advance the field even if its observed outcome turns out to be incorrect. With a cumulative theory, it is more likely that we will develop new techniques or motivate new challenges or extensions to theory independent of the details of the empirical results. In a field where theory and experiment go hand-in-hand, a single paper can advance both our empirical grounding and our theoretical techniques.

I am certainly not the only one to suggest that a lack of unifying, common, and cumulative theory as the cause for the replication crisis. But how do we act on this?

Can we just start mathematical modelling? In the case of the replicator crisis in cancer research, will mathematical oncology help?

Not necessarily. But I’ll come back to this at the end. First, a story.

Let us look at a case study: algorithmic trading in quantitative finance. This is a field that is heavy in math and light on controlled experiments. In some ways, its methodology is the opposite of the dominant methodology of psychology or cancer research. It is all about doing math and writing code to predict the markets.

Yesterday on /r/algotrading, /u/chiefkul reported on his effort to reproduce 130+ papers about “predicting the stock market”. He coded them from scratch and found that “every single paper was either p-hacked, overfit [or] subsample[d] …OR… had a smidge of Alpha [that disappears with transaction costs]”.

There’s a replication crisis for you. Even the most pessimistic readings of the literature in psychology or medicine produce significantly higher levels of successful replication. So let’s dig in a bit.

Read more of this post


Twitter vs blogs and science advertising vs discussion

I read and write a lot of science outside the traditional medium of papers. Most often on blogs, twitter, and Reddit. And these alternative media are colliding more and more with the ‘mainstream media’ of academic publishing. A particularly visible trend has been the twitter paper thread: a collection of tweets that advertise a new paper and summarize its results. I’ve even written such a thread (5-6 March) for my recent paper on how to use cstheory to think about evolution.

Recently, David Basanta stumbled across an old (19 March) twitter thread by Dan Quintana for why people should use such twitter threads, instead of blog posts, to announce their papers. Given my passion for blogging, I think that David expected me to defend blogs against this assault. But instead of siding with David, I sided with Dan Quintana.

If you are going to be ‘announcing’ a paper via a thread then I think you should use a twitter thread, not a blog. At least, that is what I will try to stick to on TheEGG.

Yesterday, David wrote a blog post to elaborate on his position. So I thought that I would follow suit and write one to elaborate mine. Unlike David’s blog, TheEGG has comments — so I encourage you, dear reader, to use those to disagree with me.

Read more of this post

On Frankfurt’s Truth and Bullshit

In 2015 and 2016, as part of my new year reflections on the year prior, I wrote a post about the ‘year in books’. The first was about philosophy, psychology and political economy and it was unreasonably long and sprawling as post. The second time, I decided to divide into several posts, but only wrote the first one on cancer: Neanderthals to the National Cancer Act to now. In this post, I want to return two of the books that were supposed to be in the second post for that year: Harry G. Frankfurt’s On Bullshit and On Truth.

Reading these two books in 2015 might have been an unfortunate preminission for the post-2016 world. And I wonder if a lot of people have picked up Frankfurt’s essays since. But with a shortage of thoughts for this week, I thought it’s better late than never to share my impressions.

In this post I want to briefly summarize my reading of Frankfurt’s position. And then I’ll focus on a particular shortcoming: I don’t think Frankfurt focuses enough on how and what for Truth is used in practice. From the perspective of their relationship to investigation and inquiry, Truth and Bullshit start to seem much less distinct than Frankfurt makes them. And both start to look like the negative force — although in the case of Truth: sometimes a necessary negative.
Read more of this post

Cataloging a year of social blogging

With almost all of January behind us, I want to share the final summary of 2018. The first summary was on cancer and fitness landscapes; the second was on metamodeling. This third summary continues the philosophical trend of the second, but focuses on analyzing the roles of science, philosophy, and related concepts in society.

There were only 10 posts on the societal aspects of science and philosophy in 2018, with one of them not on this blog. But I think it is the most important topic to examine. And I wish that I had more patience and expertise to do these examinations.

Read more of this post

Blogging, open science and the public intellectual

For the last half-year I’ve been keeping TheEGG to a strict weekly schedule. I’ve been making sure that at least one post comes out during every calendar week. At times this has been taxing. And of course this causes both reflection on why I blog and an urge to dip into old unfinished posts. This week I deliver both. Below is a linkdex of 7 posts from 2016 and earlier (with a few recent comments added here and there) commenting on how scientists and public intellectuals (whatever that phrase might mean) should approach blogging.

If you, dear reader, are a fellow science blogger then you might have seen these articles before. But I hope you might find it useful to revisit and reflect on some of them. I certainly found it insightful. And if you have any important updates to add to these links then these updates are certainly encouraged.

Read more of this post

Methods and morals for mathematical modeling

About a year ago, Vincent Cannataro emailed me asking about any resources that I might have on the philosophy and etiquette of mathematical modeling and inference. As regular readers of TheEGG know, this topic fascinates me. But as I was writing a reply to Vincent, I realized that I don’t have a single post that could serve as an entry point to my musings on the topic. Instead, I ended up sending him an annotated list of eleven links and a couple of book recommendations. As I scrambled for a post for this week, I realized that such an analytic linkdex should exist on TheEGG. So, in case others have interests similar to Vincent and me, I thought that it might be good to put together in one place some of the resources about metamodeling and related philosophy available on this blog.

This is not an exhaustive list, but it might still be relatively exhausting to read.

I’ve expanded slightly past the original 11 links (to 14) to highlight some more recent posts. The free association of the posts is structured slightly, with three sections: (1) classifying mathematical models, (2) pros and cons of computational models, and (3) ethics of models.

Read more of this post

Separating theory from nonsense via communication norms, not Truth

Earlier this week on twitter, Brian Skinner wrote an interesting thread on how to distinguish good theory from crackpottery. He started with a trait that both theorists and crackpots share: we have an “irrational self-confidence” — a belief that just by thinking we “can arrive at previously-unrealized truths about the world”. From this starting point, the two diverge in their use of evidence. A crackpot relies primarily on positive evidence: he thinks hard about a problem, arrives at a theory that feels right, and then publicizes the result.

A theorist, on the other prong, incorporates negative evidence: she ponders hard about a problem, arrives at a theory that feels right and then proceeds to try to disprove that theory. She reads the existing literature and looks at the competing theories, takes time to understand them and compare them against her own. If any disagree with hers then she figures out why those theories are wrong. She pushes her theory to the extremes, looks at its limiting cases and checks them for agreement with existing knowledge. Only after her theory comes out unscathed from all these challenges does she publicize it.

For Skinner, this second prong is the definition of scholarship. In practice, coming up with a correct theory is mostly a painful process of discarding many of your own wrong attempts. A good theorist is a thorough, methodical and skeptical of their own ideas.

The terminology of crackpottery vs scholarship is probably overly harsh, as Skinner acknowledges. And in practice, somebody might be a good theorist in one domain but a crackpot elsewhere. As Malkym Lesdrae points out, there are many accomplished accademics who are also crackpot theorists: “Most often it’s about things outside their field of specialty”. Thus, this ideal self-skepticism might be domain specific.

It is also a destructive ideal.

In other words, I disagreed with Skinner on the best way to separate good theory from nonsense. Mostly on the framing. Skinner crystalized our disagreement in a tweet: whereas he views self-skepticism as I an obligation to the Truth, I view a similar sort of self-reflective behavior as a social obligation. I am committed to this latter view because I want to make sense of things like heuristic models, where truth is secondary to other modelling concerns. Where truth is not the most useful yardstick for checking the usefulness of model. Where you hear Box’s slogan: “all models are wrong, but some are useful.

Given the brief summary of Skinner’s view above — and please, Brian, correct me in the comments if I misrepresented your position — I want to use the rest of this post to sketch what I mean by self-reflective behavior as a social obligation.
Read more of this post

Unity of knowing and doing in education and society

Traditionally, knowledge is separated from activity and passed down from teacher to student as disembodied information. For John Dewey, this tradition reinforces the false dichotomy between knowing and doing. A dichotomy that is socially destructive, and philosophically erroneous.

I largely agree with the above. The best experiences I’ve had of learning was through self-guided discovery of wanting to solve a problem. This is, for example, one of the best ways to learn to program, or math, or language, or writing, or nearly anything else. But in what way is this ‘doing’? Usually, ‘doing’ has a corporal physicality to it. Thinking happens while you sit at your desk: in fact, you might as well be disembodied. Doing happens elsewhere and requires your body.

In this post, I want to briefly discuss the knowing-doing dichotomy. In particular, I’ll stress the importance of social embodying rather than the physical embodying of ‘doing’. I’ll close with some vague speculations on the origins of this dichotomy and a dangling thread about how this might connect to the origins of science.

Read more of this post

As a scientist, don’t speak to the public. Listen to the public.

There is a lot of advice written out there for aspiring science writers and bloggers. And as someone who writes science and about science, I read through this at times. The most common trend I see in this advice is to make your writing personal and to tell a story, with all the drama and plot-twists of a good page-turner. This is solid advise for good writing, one that we shouldn’t restrict to writing about science but also for writing the articles that are science. That would make reading and writing as a scientist (two of our biggest activities) much less boring. Yet we don’t do this. More importantly, we put up with reading hundreds of poorly written, boring papers.

So if scientists put up with awful writing, why do we have to write better for the public? I think that the answer to this reveals something very important the role of science in society; who science serves and who it doesn’t. This affects how we should be thinking about activities like ‘science outreach’.

In this post, I want to put together some thoughts that have been going through my mind on funding, science and society. These are mostly half-baked and I am eager to be corrected. More importantly, I am hoping that this encourages you, dear reader, to share any thoughts that this discussion sparks.

Read more of this post

Poor reasons for preprints & post-publication peer-review

Last week, I revived the blog with some reflections on open science. In particular, I went into the case for pre-prints and the problem with the academic publishing system. This week, I want to continue this thread by examining three common arguments for preprints: speed, feedback, and public access. I think that these arguments are often motivated in the wrong way. In their standard presentation, they are bad arguments for a good idea. By pointing out these perceived shortcoming, I hope that we can develop more convincing arguments for preprints. Or maybe methods of publication that are even better than the current approach to preprints.

These thoughts are not completely formed, and I am eager to refine them in follow up posts. As it stand, this is more of a hastily written rant.

Read more of this post