Allegory of the replication crisis in algorithmic trading

One of the most interesting ongoing problems in metascience right now is the replication crisis. This a methodological crisis around the difficulty of reproducing or replicating past studies. If we cannot repeat or recreate the results of a previous study then it casts doubt on if those ‘results’ were real or just artefacts of flawed methodology, bad statistics, or publication bias. If we view science as a collection of facts or empirical truths than this can shake the foundations of science.

The replication crisis is most often associated with psychology — a field that seems to be having the most active and self-reflective engagement with the replication crisis — but also extends to fields like general medicine (Ioannidis, 2005a,b; 2016), oncology (Begley & Ellis, 2012), marketing (Hunter, 2001), economics (Camerer et al., 2016), and even hydrology (Stagge et al., 2019).

When I last wrote about the replication crisis back in 2013, I asked what science can learn from the humanities: specifically, what we can learn from memorable characters and fanfiction. From this perspective, a lack of replication was not the disease but the symptom of the deeper malady of poor theoretical foundations. When theories, models, and experiments are individual isolated silos, there is no inherent drive to replicate because the knowledge is not directly cumulative. Instead of forcing replication, we should aim to unify theories, make them more precise and cumulative and thus create a setting where there is an inherent drive to replicate.

More importantly, in a field with well-developed theory and large deductive components, a study can advance the field even if its observed outcome turns out to be incorrect. With a cumulative theory, it is more likely that we will develop new techniques or motivate new challenges or extensions to theory independent of the details of the empirical results. In a field where theory and experiment go hand-in-hand, a single paper can advance both our empirical grounding and our theoretical techniques.

I am certainly not the only one to suggest that a lack of unifying, common, and cumulative theory as the cause for the replication crisis. But how do we act on this?

Can we just start mathematical modelling? In the case of the replicator crisis in cancer research, will mathematical oncology help?

Not necessarily. But I’ll come back to this at the end. First, a story.

Let us look at a case study: algorithmic trading in quantitative finance. This is a field that is heavy in math and light on controlled experiments. In some ways, its methodology is the opposite of the dominant methodology of psychology or cancer research. It is all about doing math and writing code to predict the markets.

Yesterday on /r/algotrading, /u/chiefkul reported on his effort to reproduce 130+ papers about “predicting the stock market”. He coded them from scratch and found that “every single paper was either p-hacked, overfit [or] subsample[d] …OR… had a smidge of Alpha [that disappears with transaction costs]”.

There’s a replication crisis for you. Even the most pessimistic readings of the literature in psychology or medicine produce significantly higher levels of successful replication. So let’s dig in a bit.

Read more of this post

Advertisements

Blogging community of computational and mathematical oncologists

A few weeks ago, David Basanta reached out to me (and many other members of the mathematical oncology community) about building a community blog together. This week, to coincide with the Society for Mathematical Biology meeting in Montreal, we launched the blog. In keeping with the community focus, we have an editorial board of 8 people that includes (in addition to David and me): Christina Curtis, Elana Fertig, Stacey Finley, Jakob Nikolas Kather, Jacob G. Scott, and Jeffrey West. The theme is computational and mathematical oncology, but we welcome contributions from all nearby disciplines.

The behind the scenes discussion building up to this launch was one of the motivators for my post on twitter vs blogs and science advertising versus discussion. And as you might expect, dear reader, it was important to me that this new community blog wouldn’t be just about science outreach and advertising of completed work. For me — and I think many of the editors — it is important that the blog is a place for science engagement and for developing new ideas in the open. A way to peel back the covers that hide how science is done and break the silos that inhibit a collaborative and cooperative atmosphere. A way to not only speak at the public or other scientists, but also an opportunity to listen.

For me, the blog is a challenge to the community. A challenge to engage in more flexible, interactive, and inclusive development of new ideas than is possible with traditional journals. While also allowing for a deeper, more long-form and structured discussion than is possible with twitter. If you’ve ever written a detailed research email, long discussion on Slack, or been part of an exciting journal club, lab meeting, or seminar, you know the amount of useful discussion that is foundational to science but that seldom appears in public. My hope is that we can make these discussions more public and more beneficial to the whole community.

Before pushing for the project, David made sure that he knew the lay of the land. He assembled a list of the existing blogs on computational and mathematical oncology. In our welcome post, I made sure to highlight a few of the examples of our community members developing new ideas, sharing tools and techniques, and pushing beyond outreach and advertising. But since we wanted the welcome post to be short, there was not the opportunity for a more thorough survey of our community.

In this post, I want to provide a more detailed — although never complete nor exhaustive — snapshot of the blogging community of computational and mathematical oncologists. At least the part of it that I am familiar with. If I missed you then please let me know. This is exactly what the comments on this post are for: expanding our community.

Read more of this post

Twitter vs blogs and science advertising vs discussion

I read and write a lot of science outside the traditional medium of papers. Most often on blogs, twitter, and Reddit. And these alternative media are colliding more and more with the ‘mainstream media’ of academic publishing. A particularly visible trend has been the twitter paper thread: a collection of tweets that advertise a new paper and summarize its results. I’ve even written such a thread (5-6 March) for my recent paper on how to use cstheory to think about evolution.

Recently, David Basanta stumbled across an old (19 March) twitter thread by Dan Quintana for why people should use such twitter threads, instead of blog posts, to announce their papers. Given my passion for blogging, I think that David expected me to defend blogs against this assault. But instead of siding with David, I sided with Dan Quintana.

If you are going to be ‘announcing’ a paper via a thread then I think you should use a twitter thread, not a blog. At least, that is what I will try to stick to on TheEGG.

Yesterday, David wrote a blog post to elaborate on his position. So I thought that I would follow suit and write one to elaborate mine. Unlike David’s blog, TheEGG has comments — so I encourage you, dear reader, to use those to disagree with me.

Read more of this post

Description before prediction: evolutionary games in oncology

As I discussed towards the end of an old post on cross-validation and prediction: we don’t always want to have prediction as our primary goal, or metric of success. In fact, I think that if a discipline has not found a vocabulary for its basic terms, a grammar for combining those terms, and a framework for collecting, interpreting, and/or translating experimental practice into those terms then focusing on prediction can actually slow us down or push us in the wrong direction. To adapt Knuth: I suspect that premature optimization of predictive potential is the root of all evil.

We need to first have a good framework for describing and summarizing phenomena before we set out to build theories within that framework for predicting phenomena.

In this brief post, I want to ask if evolutionary games in oncology are ready for building predictive models. Or if they are still in need of establishing themselves as a good descriptive framework.

Read more of this post

Fighting about frequency and randomly generating fitness landscapes

A couple of months ago, I was in Cambridge for the Evolution Evolving conference. It was a lot of fun, and it was nice to catch up with some familiar faces and meet some new ones. My favourite talk was Karen Kovaka‘s “Fighting about frequency”. It was an extremely well-delivered talk on the philosophy of science. And it engaged with a topic that has been very important to discussions of my own recent work. Although in my case it is on a much smaller scale than the general phenomenon that Kovaka was concerned with,

Let me first set up my own teacup, before discussing the more general storm.

Recently, I’ve had a number of chances to present my work on computational complexity as an ultimate constraint on evolution. And some questions have repeated again and again after several of the presentations. I want to address one of these persistent questions in this post.

How common are hard fitness landscapes?

This question has come up during review, presentations, and emails (most recently from Jianzhi Zhang’s reading group). I’ve spent some time addressing it in the paper. But it is not a question with a clear answer. So unsurprisingly, my comments have not been clear. Hence, I want to use this post to add some clarity.

Read more of this post

Hiding behind chaos and error in the double pendulum

If you want a visual intuition for just how unpredictable chaotic dynamics can be then the go-to toy model is the double pendulum. There are lots of great simulations (and some physical implementations) of the double pendulum online. Recently, /u/abraxasknister posted such a simulation on the /r/physics subreddit and quickly attracted a lot of attention.

In their simulation, /u/abraxasknister has a fixed center (block dot) that the first mass (red dot) is attached to (by an invisible rigid massless bar). The second mass (blue dot) is then attached to the first mass (also by an invisible rigid massless bar). They then release these two masses from rest at some initial height and watch what happens.

The resulting dynamics are at right.

It is certainly unpredictable and complicated. Chaotic? Most importantly, it is obviously wrong.

But because the double pendulum is a famous chaotic system, some people did not want to acknowledge that there is an obvious mistake. They wanted to hide behind chaos: they claimed that for a complex system, we cannot possibly have intuitions about how the system should behave.

In this post, I want to discuss the error of hiding behind chaos, and how the distinction between microdynamics and global properties lets us catch /u/abraxasknister’s mistake.
Read more of this post

Introduction to Algorithmic Biology: Evolution as Algorithm

As Aaron Roth wrote on Twitter — and as I bet with my career: “Rigorously understanding evolution as a computational process will be one of the most important problems in theoretical biology in the next century. The basics of evolution are many students’ first exposure to “computational thinking” — but we need to finish the thought!”

Last week, I tried to continue this thought for Oxford students at a joint meeting of the Computational Society and Biological Society. On May 22, I gave a talk on algorithmic biology. I want to use this post to share my (shortened) slides as a pdf file and give a brief overview of the talk.

Winding path in a hard semi-smooth landscape

If you didn’t get a chance to attend, maybe the title and abstract will get you reading further:

Algorithmic Biology: Evolution is an algorithm; let us analyze it like one.

Evolutionary biology and theoretical computer science are fundamentally interconnected. In the work of Charles Darwin and Alfred Russel Wallace, we can see the emergence of concepts that theoretical computer scientists would later hold as central to their discipline. Ideas like asymptotic analysis, the role of algorithms in nature, distributed computation, and analogy from man-made to natural control processes. By recognizing evolution as an algorithm, we can continue to apply the mathematical tools of computer science to solve biological puzzles – to build an algorithmic biology.

One of these puzzles is open-ended evolution: why do populations continue to adapt instead of getting stuck at local fitness optima? Or alternatively: what constraint prevents evolution from finding a local fitness peak? Many solutions have been proposed to this puzzle, with most being proximal – i.e. depending on the details of the particular population structure. But computational complexity provides an ultimate constraint on evolution. I will discuss this constraint, and the positive aspects of the resultant perpetual maladaptive disequilibrium. In particular, I will explain how we can use this to understand both on-going long-term evolution experiments in bacteria; and the evolution of costly learning and cooperation in populations of complex organisms like humans.

Unsurprisingly, I’ve writen about all these topics already on TheEGG, and so my overview of the talk will involve a lot of links back to previous posts. In this way. this can serve as an analytic linkdex on algorithmic biology.
Read more of this post

British agricultural revolution gave us evolution by natural selection

This Wednesday, I gave a talk on algorithmic biology to the Oxford Computing Society. One of my goals was to show how seemingly technology oriented disciplines (such as computer science) can produce foundational theoretical, philosophical and scientific insights. So I started the talk with the relationship between domestication and natural selection. Something that I’ve briefly discussed on TheEGG in the past.

Today we might discuss artificial selection or domestication (or even evolutionary oncology) as applying the principles of natural selection to achieve human goals. This is only because we now take Darwin’s work as given. At the time that he was writing, however, Darwin actually had to make his argument in the other direction. Darwin’s argument proceeds from looking at the selection algorithms used by humans and then abstracting it to focus only on the algorithm and not the agent carrying out the algorithm. Having made this abstraction, he can implement the breeder by the distributed struggle for existence and thus get natural selection.

The inspiration is clearly from the technological to the theoretical. But there is a problem with my story.

Domestication of plants and animals in ancient. Old enough that we have cancers that arose in our domesticated helpers 11,000 years ago and persist to this day. Domestication in general — the fruit of the first agricultural revolution — can hardly qualify as a new technology in Darwin’s day. It would have been just as known to Aristotle, and yet he thought species were eternal.

Why wasn’t Aristotle or any other ancient philosopher inspired by the agriculture and animal husbandry of their day to arrive at the same theory as Darwin?

The ancients didn’t arrive at the same view because it wasn’t the domestication of the first agricultural revolution that inspired Darwin. It was something much more contemporary to him. Darwin was inspired by the British agricultural revolution of the 18th and early 19th century.

In this post, I want to sketch this connection between the technological development of the Georgian era and the theoretical breakthroughs in natural science in the subsequent Victorian era. As before, I’ll focus on evolution and algorithm.

Read more of this post

Four stages in the relationship of computer science to other fields

This weekend, Oliver Schneider — an old high-school friend — is visiting me in the UK. He is a computer scientist working on human-computer interaction and was recently appointed as an assistant professor at the Department of Management Sciences, University of Waterloo. Back in high-school, Oliver and I would occasionally sneak out of class and head to the University of Saskatchewan to play counter strike in the campus internet cafe. Now, Oliver builds haptic interfaces that can represent virtually worlds physically so vividly that a blind person can now play a first-person shooter like counter strike. Take a look:

Now, dear reader, can you draw a connecting link between this and the algorithmic biology that I typically blog about on TheEGG?

I would not be able to find such a link. And that is what makes computer science so wonderful. It is an extremely broad discipline that encompasses many areas. I might be reading a paper on evolutionary biology or fixed-point theorems, while Oliver reads a paper on i/o-psychology or how to cut 150 micron-thick glass. Yet we still bring a computational flavour to the fields that we interface with.

A few years ago, Karp’s (2011; Xu & Tu, 2011) wrote a nice piece about the myriad ways in which computer science can interact with other disciplines. He was coming at it from a theorist’s perspective — that is compatible with TheEGG but maybe not as much with Oliver’s work — and the bias shows. But I think that the stages he identified in the relationship between computer science and others fields is still enlightening.

In this post, I want to share how Xu & Tu (2011) summarize Karp’s (2011) four phases of the relationship between computer science and other fields: (1) numerical analysis, (2) computational science, (3) e-Science, and the (4) algorithmic lens. I’ll try to motivate and prototype these stages with some of my own examples.
Read more of this post

On Frankfurt’s Truth and Bullshit

In 2015 and 2016, as part of my new year reflections on the year prior, I wrote a post about the ‘year in books’. The first was about philosophy, psychology and political economy and it was unreasonably long and sprawling as post. The second time, I decided to divide into several posts, but only wrote the first one on cancer: Neanderthals to the National Cancer Act to now. In this post, I want to return two of the books that were supposed to be in the second post for that year: Harry G. Frankfurt’s On Bullshit and On Truth.

Reading these two books in 2015 might have been an unfortunate preminission for the post-2016 world. And I wonder if a lot of people have picked up Frankfurt’s essays since. But with a shortage of thoughts for this week, I thought it’s better late than never to share my impressions.

In this post I want to briefly summarize my reading of Frankfurt’s position. And then I’ll focus on a particular shortcoming: I don’t think Frankfurt focuses enough on how and what for Truth is used in practice. From the perspective of their relationship to investigation and inquiry, Truth and Bullshit start to seem much less distinct than Frankfurt makes them. And both start to look like the negative force — although in the case of Truth: sometimes a necessary negative.
Read more of this post