Elements of biological computation & stochastic thermodynamics of life

This week, I was visiting the Santa Fe Institute for a workshop organized by Albert Kao, Jessica Flack, and David Wolpert on “What is biological computation?” (11 – 13 September 2019). It was an ambitious question and I don’t think that we were able to answer it in just three days of discussion, but I think that we all certainly learnt a lot.

At least, I know that I learned a lot of new things.

The workshop had around 34 attendees from across the world, but from the reaction on twitter it seems like many more would have been eager to attend also. Hence, both to help synchronize the memory networks of all the participants and to share with those who couldn’t attend, I want to use this series of blog post to jot down some of the topics that were discussed at the meeting.

During the conference, I was live tweeting. So if you prefer my completely raw, unedited impressions in tweet form then you can take a look at those threads for Wednesday (14 tweets), Thursday (15 tweets), and Friday (31 tweets). The workshop itself was organized around discussion, and the presentations were only seeds. Unfortunately, my live tweeting and this post are primarily limited to just the presentations. But I will follow up with some synthesis and reflection in the future.

Due to the vast amount discussed during the workshop, I will focus this post on just the first day. I’ll follow with posts on the other days later.

It is also important to note that this is the workshop through my eyes. And thus this retelling is subject to the limits of my understanding, notes, and recollection. In particular, I wasn’t able to follow the stochastic thermodynamics that dominated the afternoon of the first day. And although I do provide some retelling, I hope that I can convince one of the experts to provide a more careful blog post on the topic.

Read more of this post

Web of C-lief: conjectures vs. model assumptions vs. scientific beliefs

Web of C-lief with the non-contradiction spider

A sketch of the theoretical computer science Web of C-lief weaved by the non-contradiction spider.

In his 1951 paper on the “Two Dogmas of Empiricism”, W.V.O Quine introduced the Web of Belief as a metaphor for his holistic epistemology of scientific knowledge. With this metaphor, Quine aimed to give an alternative to the reductive atomising epistemology of the logical empiricists. For Quine, no “fact” is an island and no experiment can be focused in to resole just one hypothesis. Instead, each of our beliefs forms part of an interconnected web and when a new belief conflicts with an existing one then this is a signal for us to refine some belief. But this signal does not unambiguously single out a specific belief that we should refine. Just a set of beliefs that are incompatible with out new one, or that if refined could bring our belief system back into coherence. We then use alternative mechanisms like simplicity or minimality (or some aesthetic consideration) to choose which belief to update. Usually, we are more willing to give up beliefs that are peripheral to the web — that are connected to or change fewer other beliefs — than the beliefs that are central to our web.

In this post, I want to play with Quine’s web of belief metaphor in the context of science. This will force us to restrict it to specific domains instead of the grand theory that Quine intended. From this, I can then adapt the metaphor from belief in science to c-liefs in mathematics. This will let me discuss how complexity class seperation conjectures are structured in theoretical computer science and why this is fundamentally different from model assumptions in natural science.

So let’s start with a return to the relevant philosophy.

Read more of this post

Idealization vs abstraction for mathematical models of evolution

This week I was in Turku, Finland for the annual congress of the European Society for Evolutionary Biology. I presented in the symposium on mathematical models in evolutionary biology organized by Guy Cooper, Matishalin Patel, Tom Scott, and Asher Leeks. It was a fun. It was also a big challenge given the short ten minute format. I decided to use my ten minutes to try to convince the audience that we should consider not just idealized models but also abstractions. So after my typical introduction of computational vs algorithmic biology, I switched to talking about triangles. If you would like, dear reader, then you can watch the whole session online (or grab my slides as pdf). In this post, I just want to focus on the distinction between idealized vs. abstract models.

Just as in my ESEB talk, I’ll use triangles to explain the distinction between idealized vs. abstract models.

Read more of this post

Allegory of the replication crisis in algorithmic trading

One of the most interesting ongoing problems in metascience right now is the replication crisis. This a methodological crisis around the difficulty of reproducing or replicating past studies. If we cannot repeat or recreate the results of a previous study then it casts doubt on if those ‘results’ were real or just artefacts of flawed methodology, bad statistics, or publication bias. If we view science as a collection of facts or empirical truths than this can shake the foundations of science.

The replication crisis is most often associated with psychology — a field that seems to be having the most active and self-reflective engagement with the replication crisis — but also extends to fields like general medicine (Ioannidis, 2005a,b; 2016), oncology (Begley & Ellis, 2012), marketing (Hunter, 2001), economics (Camerer et al., 2016), and even hydrology (Stagge et al., 2019).

When I last wrote about the replication crisis back in 2013, I asked what science can learn from the humanities: specifically, what we can learn from memorable characters and fanfiction. From this perspective, a lack of replication was not the disease but the symptom of the deeper malady of poor theoretical foundations. When theories, models, and experiments are individual isolated silos, there is no inherent drive to replicate because the knowledge is not directly cumulative. Instead of forcing replication, we should aim to unify theories, make them more precise and cumulative and thus create a setting where there is an inherent drive to replicate.

More importantly, in a field with well-developed theory and large deductive components, a study can advance the field even if its observed outcome turns out to be incorrect. With a cumulative theory, it is more likely that we will develop new techniques or motivate new challenges or extensions to theory independent of the details of the empirical results. In a field where theory and experiment go hand-in-hand, a single paper can advance both our empirical grounding and our theoretical techniques.

I am certainly not the only one to suggest that a lack of unifying, common, and cumulative theory as the cause for the replication crisis. But how do we act on this?

Can we just start mathematical modelling? In the case of the replicator crisis in cancer research, will mathematical oncology help?

Not necessarily. But I’ll come back to this at the end. First, a story.

Let us look at a case study: algorithmic trading in quantitative finance. This is a field that is heavy in math and light on controlled experiments. In some ways, its methodology is the opposite of the dominant methodology of psychology or cancer research. It is all about doing math and writing code to predict the markets.

Yesterday on /r/algotrading, /u/chiefkul reported on his effort to reproduce 130+ papers about “predicting the stock market”. He coded them from scratch and found that “every single paper was either p-hacked, overfit [or] subsample[d] …OR… had a smidge of Alpha [that disappears with transaction costs]”.

There’s a replication crisis for you. Even the most pessimistic readings of the literature in psychology or medicine produce significantly higher levels of successful replication. So let’s dig in a bit.

Read more of this post

Fighting about frequency and randomly generating fitness landscapes

A couple of months ago, I was in Cambridge for the Evolution Evolving conference. It was a lot of fun, and it was nice to catch up with some familiar faces and meet some new ones. My favourite talk was Karen Kovaka‘s “Fighting about frequency”. It was an extremely well-delivered talk on the philosophy of science. And it engaged with a topic that has been very important to discussions of my own recent work. Although in my case it is on a much smaller scale than the general phenomenon that Kovaka was concerned with,

Let me first set up my own teacup, before discussing the more general storm.

Recently, I’ve had a number of chances to present my work on computational complexity as an ultimate constraint on evolution. And some questions have repeated again and again after several of the presentations. I want to address one of these persistent questions in this post.

How common are hard fitness landscapes?

This question has come up during review, presentations, and emails (most recently from Jianzhi Zhang’s reading group). I’ve spent some time addressing it in the paper. But it is not a question with a clear answer. So unsurprisingly, my comments have not been clear. Hence, I want to use this post to add some clarity.

Read more of this post

Four stages in the relationship of computer science to other fields

This weekend, Oliver Schneider — an old high-school friend — is visiting me in the UK. He is a computer scientist working on human-computer interaction and was recently appointed as an assistant professor at the Department of Management Sciences, University of Waterloo. Back in high-school, Oliver and I would occasionally sneak out of class and head to the University of Saskatchewan to play counter strike in the campus internet cafe. Now, Oliver builds haptic interfaces that can represent virtually worlds physically so vividly that a blind person can now play a first-person shooter like counter strike. Take a look:

Now, dear reader, can you draw a connecting link between this and the algorithmic biology that I typically blog about on TheEGG?

I would not be able to find such a link. And that is what makes computer science so wonderful. It is an extremely broad discipline that encompasses many areas. I might be reading a paper on evolutionary biology or fixed-point theorems, while Oliver reads a paper on i/o-psychology or how to cut 150 micron-thick glass. Yet we still bring a computational flavour to the fields that we interface with.

A few years ago, Karp’s (2011; Xu & Tu, 2011) wrote a nice piece about the myriad ways in which computer science can interact with other disciplines. He was coming at it from a theorist’s perspective — that is compatible with TheEGG but maybe not as much with Oliver’s work — and the bias shows. But I think that the stages he identified in the relationship between computer science and others fields is still enlightening.

In this post, I want to share how Xu & Tu (2011) summarize Karp’s (2011) four phases of the relationship between computer science and other fields: (1) numerical analysis, (2) computational science, (3) e-Science, and the (4) algorithmic lens. I’ll try to motivate and prototype these stages with some of my own examples.
Read more of this post

Coarse-graining vs abstraction and building theory without a grounding

Back in September 2017, Sandy Anderson was tweeting about the mathematical oncology revolution. To which Noel Aherne replied with a thorny observation that “we have been curing cancers for decades with radiation without a full understanding of all the mechanisms”.

This lead to a wide-ranging discussion and clarification of what is meant by terms like mechanism. I had meant to blog about these conversations when they were happening, but the post fell through the cracks and into the long to-write list.

This week, to continue celebrating Rockne et al.’s 2019 Mathematical Oncology Roadmap, I want to revisit this thread.

And not just in cancer. Although my starting example will focus on VEGF and cancer.

I want to focus on a particular point that came up in my discussion with Paul Macklin: what is the difference between coarse-graining and abstraction? In the process, I will argue that if we want to build mechanistic models, we should aim not after explaining new unknown effects but rather focus on effects where we already have great predictive power from simple effective models.

Since Paul and I often have useful disagreements on twitter, hopefully writing about it on TheEGG will also prove useful.

Read more of this post

Quick introduction: the algorithmic lens

Computers are a ubiquitous tool in modern research. We use them for everything from running simulation experiments and controlling physical experiments to analyzing and visualizing data. For almost any field ‘X’ there is probably a subfield of ‘computational X’ that uses and refines these computational tools to further research in X. This is very important work and I think it should be an integral part of all modern research.

But this is not the algorithmic lens.

In this post, I will try to give a very brief description (or maybe just a set of pointers) for the algorithmic lens. And of what we should imagine when we see an ‘algorithmic X’ subfield of some field X.

Read more of this post

Danger of motivatiogenesis in interdisciplinary work

Randall Munroe has a nice old xkcd on citogenesis: the way factoids get created from bad checking of sources. You can see the comic at right. But let me summarize the process without direct reference to Wikipedia:

1. Somebody makes up a factoid and writes it somewhere without citation.
2. Another person then uses the factoid in passing in a more authoritative work, maybe sighting the point in 1 or not.
3. Further work inherits the citation from 2, without verifying its source, further enhancing the legitimacy of the factoid.
4. The cycle repeats.

Soon, everybody knows this factoid and yet there is no ground truth to back it up. I’m sure we can all think of some popular examples. Social media certainly seems to make this sort of loop easier.

We see this occasionally in science, too. Back in 2012, Daniel Lemire provided a nice example of this with algorithms research. But usually with science factoids, it eventually gets debuked with new experiments or proofs. Mostly because it can be professionally rewarding to show that a commonly assumed factoid is actually false.

But there is a similar effect in science that seems to me even more common, and much harder to correct: motivatiogenesis.

Motivatiogenesis can be especially easy to fall into with interdisiplinary work. Especially if we don’t challenge ourselves to produce work that is an advance in both (and not just one) of the fields we’re bridging.

Read more of this post

Cataloging a year of metamodeling blogging

Last Saturday, with just minutes to spare in the first calendar week of 2019, I shared a linkdex the ten (primarily) non-philosophical posts of 2018. It was focused on mathematical oncology and fitness landscapes. Now, as the second week runs into its final hour, it is time to start into the more philosophical content.

Here are 18 posts from 2018 on metamodeling.

With a nice number like 18, I feel obliged to divide them into three categories of six articles each. These three categories: (1) abstraction and reductive vs. effective theorie; (2) metamodeling and philosophy of mathematical biology; and the (3) historical context for metamodeling.

You might expect the third category to be an after-though. But it actually includes some of the most read posts of 2018. So do skim the whole list, dear reader.

Next week, I’ll discuss my remaining ten posts of 2018. The posts focused on the interface of science and society.
Read more of this post