A couple of years ago, Robert had a computer science question. One at the data analysis and visualization stage of the relationship between computer science and cancer. Given that I haven’t posted code on TheEGG in a long time, I thought I’d share some visualizations I wrote to address Robert’s question.
There are many ways to measure the size of populations in biology. Given that we use it in our game assay, I’ve written a lot about using time-lapse microscopy of evolving populations. But this isn’t the only — or most popular — approach. It is much more common to dillute populations heavily and then count colony forming units (CFUs). I’ve discussed this briefly in the context of measuring stag-hunting bacteria.
But you can also combine both approaches. And do time-lapse microscopy of the colonies as they form.
A couple of years ago, Robert Vander Velde Andriy Marusyk were working on experiments that use colony forming units (CFUs) as a measure of populations. However, they wanted to dig deeper into the heterogeneous dynamics of CFUs by tracking the formation process through time-lapsed microscopy. Robert asked me if I could help out with a bit of the computer vision, so I wrote a Python script for them to identify and track individual colonies through time. I thought that the code might be useful to others — or me in the future — so I wanted to write a quick post explaining my approach.
This post ended up trapped in the drafts box of TheEGG for a while, but I thought now is as good a time as any to share it. I don’t know where Robert’s work on this has gone since, or if the space-time visualizations I developed were of any use. Maybe he can fill us in in the comments or with a new guest post.
So let’s just get started with the code.
Of course, we first need to import the main packages: numpy
, pyplot
, and cv2
.
import numpy as np import cv2 from matplotlib import pyplot as plt
The first two are standard packages, the last one — OpenCV — takes a little bit more work to install.
Now we do two main tasks at once, we load all the images and create something I want to call a ‘space-time map’. A space-time map is an image that uses the colour map of a pixel to represent the number of time points that it appears in. This is the first name that occurred to me, if you’ve seen this visualisation used before, dear reader, and know its name then please let me know.
threshImgs_all = [] num_imgs = 24 #load the images and create space-time image (total_img) img_all = [] total_img = np.zeros_like(cv2.imread(str(0).zfill(4) + '.tif',cv2.IMREAD_GRAYSCALE)) for img_num in range(0,24): f_name = str(img_num).zfill(4) + '.tif' img = cv2.bitwise_not(cv2.imread(f_name,cv2.IMREAD_GRAYSCALE)) img_all.append(img) total_img = cv2.scaleAdd(img,1/num_imgs,total_img) plt.imshow(total_img, cmap="magma_r") plt.show()
This results in an image like:
From this image, we can get persistent numbers for all the colonies that existed:
#get the colonies _, total_thresh = cv2.threshold(img,127,255,cv2.THRESH_BINARY) _, total_colonies = cv2.connectedComponents(total_thresh) num_colonies = np.amax(total_colonies) print("There are " + str(num_colonies) + " colonies")
More importantly, the image total_colonies
now has each non-background pixel labeled by its colony number, so counting the number of pixels in each colony at each time point becomes as straightforward as applying a mask:
#use the total image (properly thresholded) as the permanent numbers for the colonies; #get future colonies numbers from them) colony_sizes = np.zeros((num_colonies + 1,num_imgs), dtype=np.int) #Note that colony_size[0,:] will contain the amount of empty space img_num = 0 for img in img_all: #label colonies by their numbers (for upto 255 colonies): labeled_img = np.minimum(total_colonies,img) #get the colonies that appear and their sizes colonies, sizes = np.unique(labeled_img, return_counts = True) colony_sizes[colonies,img_num] = sizes img_num += 1 #plt.imshow(total_colonies, cmap='magma_r') for colony in range(1,num_colonies + 1): plt.plot(colony_sizes[colony,:]) plt.yscale('log') plt.show()
Unfortunately, there is a number of colonies that ‘blink’ in and out of existance. This is not a manifestation of reality, but probably an artefact of the image processing software used to produce the initial threshold images and the sensitivity of the microscope. As such, it can be helpful to clean up the time series and focus only on the colonies that didn’t go to extinct during the experiment and look at their population dynamics.
#let's clean up by eliminating colonies that go extinct at some point. colony_lifetimes = np.sum(colony_sizes > 0, axis = 1) surviving_colonies = np.where(colony_lifetimes == num_imgs)[0][1:] print(surviving_colonies) for colony in surviving_colonies: plt.plot(colony_sizes[colony,:]) plt.yscale('log') plt.show()
But the figure that this produces still a difficult figure to make sense. I don’t even bother to produce it given how many different lines there are going in different directions for growth rates.
What we really care about is higher level properties of these colonies like their growth rate, so let’s infer those with the help of scipy
:
#among those that don't go extinct, let's calculate the growth rates from scipy.stats import mstats growth_rates = np.zeros(num_colonies + 1) growth_rates_low = np.zeros(num_colonies + 1) growth_rates_high = np.zeros(num_colonies + 1) for colony in surviving_colonies: growth_rates[colony], _, growth_rates_low[colony], growth_rates_high[colony] = \ mstats.theilslopes(np.log(colony_sizes[colony,:])) plt.errorbar(np.arange(num_colonies + 1),growth_rates, yerr = [growth_rates - growth_rates_low, growth_rates_high - growth_rates],fmt='ko') plt.xlabel('Colony number') plt.ylabel('Growth rate') plt.show()
This yields an easier to look at colony growth rate plot, with 95% confidence intervals.
Above, we have a fitness measure for each colony, so we can look not only at the number of colony forming units but also at differences in how the colonies formed. I still find it hard to make sense of this particular plot, but looking explicitly at the inter-colony heterogeneity does seem like a good exercise. Definitely better than just summarising it as a single variance. Especially since I know from experience that sometimes a variance can hide an interesting discovery.
How would you, dear reader, extend these visualizations? Or is there a good use that you can think of putting them to? After all, visualizations are one of the most important parts of science. I hope this code helps a little. At least as inspiration, or an example of how easy it is to get things done with Python.
]]>Now, dear reader, can you draw a connecting link between this and the algorithmic biology that I typically blog about on TheEGG?
I would not be able to find such a link. And that is what makes computer science so wonderful. It is an extremely broad discipline that encompasses many areas. I might be reading a paper on evolutionary biology or fixed-point theorems, while Oliver reads a paper on i/o-psychology or how to cut 150 micron-thick glass. Yet we still bring a computational flavour to the fields that we interface with.
A few years ago, Karp’s (2011; Xu & Tu, 2011) wrote a nice piece about the myriad ways in which computer science can interact with other disciplines. He was coming at it from a theorist’s perspective — that is compatible with TheEGG but maybe not as much with Oliver’s work — and the bias shows. But I think that the stages he identified in the relationship between computer science and others fields is still enlightening.
In this post, I want to share how Xu & Tu (2011) summarize Karp’s (2011) four phases of the relationship between computer science and other fields: (1) numerical analysis, (2) computational science, (3) e-Science, and the (4) algorithmic lens. I’ll try to motivate and prototype these stages with some of my own examples.
The first stage is the numerical analysis of X. This is solving the equations that already exist in the field X but are too big for pen-and-paper. A classic example of this for physics would be from the Manhattan Project: MANIAC, IAC and ENIAC, processing for sixty days straight for the engineering calculations required to build the hydrogen bomb.
The second stage is computational science of X. Often this is abbreviated as just computational X. This is when we move from just automating the solution of equations to the sort work we wouldn’t even consider on paper: simulating and visualizing the objects of X. In the case of physics, it can at times be hard to draw the line between numerical analysis and computational sciences but in less mathematized-fields, it is usually much more clear. In biology, for example, it would be running agent-based simulations of evolution. From the visualization side it might involve all the algorithms associated with bioinformatics.
The third stage is e-Science of X. This is the name that I am most ambivalent about. This is when we manage extensive experimental data and method for collaboration over the internet. A biological example might be something like folding-at-home, or the various queriable gene or disease databases. In physics, this might involve historic examples like Tim Berners-Lee developing hypertext to help physicists collaborate on high-energy physics projects and inadvertently giving birth to the world wide web. A more recent example might be all the engineering and computer science involved in getting data out of Large Hadron Collider or in synchronizing the various observatories around the world for the black hole image. More broadly it might involve books like Michael Nielson’s (2011) Reinventing Discovery.
But all three of the stages view computer science primarily as a service provider to field X. As a mean to do better the things that field X would do anyway. They don’t fundamentally change the basic objects and theories of X. The computational science stage much shift emphasis: for example towards token-based rather than type-based models in evolutionary biology. And the discoveries that these resources might facilitate could change the field. But the computer science itself in stage one through three is seldom the direct cause of the change in itself.
This is why I can’t agree when Markowetz (2017) writes that all biology is computational biology, for example. All biology might (or should) use computational biology. But this service role in itself does not place computation at the heart of biology. For that we need algorithmic biology.
The fourth stage is the algorithmic lens on X. Computing as a universal way of thinking. This is when we recognize that theories and other theoretical objects are themselves specifications of objects, and all physical processes can themselves be viewed as computations. Once this link is made, theoretical computer science becomes part of the field itself. It’s theorems and perspectives become parts of the bedrock on the field. This is what a theoretical computer science interested in natural science field X aspires to. Karp, Xu & Tu actually call this stage the ‘computational lens’ and might have a slightly broader (but also less ambitious) view of it than myself. But I think it’s better to have a more distinct name to differentiate algorithmic X from computational X.
Regardless of name, these stages are very theory focused. They put theory on a pedestal at the top of the stages. And, in general, this is an unreasonable view. Do you, dear reader, have some suggestions for better categories? Ones that aren’t linear or hierarchical. Ones that don’t ‘end at theory’?
And if Karp, Xu & Tu’s four stages are reasonable: at what stage is your favourite field? Is this good, bad, or irrelevant?
Karp, R. M. (2011). Understanding science through the computational lens. Journal of Computer Science and Technology, 26(4), 569-577.
Markowetz, F. (2017). All biology is computational biology. PLoS Biology, 15(3): e2002050.
Nielsen, M. (2011). Reinventing discovery: the new era of networked science. Princeton University Press.
Xu, Z. W., & Tu, D. D. (2011). Three new concepts of future computer science. Journal of Computer Science and Technology, 26(4), 616-624.
]]>Reading these two books in 2015 might have been an unfortunate preminission for the post-2016 world. And I wonder if a lot of people have picked up Frankfurt’s essays since. But with a shortage of thoughts for this week, I thought it’s better late than never to share my impressions.
In this post I want to briefly summarize my reading of Frankfurt’s position. And then I’ll focus on a particular shortcoming: I don’t think Frankfurt focuses enough on how and what for Truth is used in practice. From the perspective of their relationship to investigation and inquiry, Truth and Bullshit start to seem much less distinct than Frankfurt makes them. And both start to look like the negative force — although in the case of Truth: sometimes a necessary negative.
First, I am not sure if these two works should really count as books; they are basically 20 page essays reformatted with big font, wide margins, and small pages to make cute booklets. However, since I picked them up at Barnes & Nobles as books, I thought that I would classify them as such. The former was originally published as an essay in 1986 and after its repackaging as a book it reached #1 on the New York Times bestseller list. This motivated the latter as a follow up.
Frankfurt observes that our life is full of bullshit, and sets out to provide an analysis and definition of the phenomena. He summarizes his finding at the start of the second book: “bullshitters, although they represent themselves as being engaged simply in conveying information, are not engaged in that enterprise at all.” In this deception, they have a commonality with liars, but “[w]hat they care about primarily … is whether what they say is effective in accomplishing this manipulation. Correspondingly, they are more or less indifferent to whether what they say is true or whether it is false.” This indifference is not shared by the liar who must keep an eye on the truth in order to mislead you. As such, Frankfurt believes that the bullshitter is more dangerous to society than the liar. He closes the first book with a strong denouncement of (what he considers to be) the postmodern bend:
One who is concerned to report or to conceal the facts assumes that there are indeed facts that are in some way both determinate and knowable. … Someone who ceases to believe in the possibility of identifying certain statements as true and others as false can have only two alternatives. The first is to desist both from efforts to tell the truth and from efforts to deceive. … refraining from making any assertions about the facts. The second alternative is to continue making assertions that purport to describe the way things are, but that cannot be anything except bullshit.
In the first book, Frankfurt holds the importance of truth as self-evident and leaves it to the second book to answers the questions of:
Is truth something that in fact we do — and should — especially care about? Or is the love of truth, as professed by so many distinguished thinkers and writers, itself merely another example of bullshit?
He avoids pinning down exactly what he means by truth, suggesting that the common sense notion — by which, at my most generous reading, I assume he means something like Sellars’ manifest image — will do. Unsurprisingly, he doesn’t only see truth as important but follows Spinoza to the conclusion that anybody who values their life must also (maybe unknowingly) love truth. Frankfurt extends the importance of truth from the individual to society with the historical claim that:
Civilizations have never gotten along healthily, and cannot get along healthily, without large quantities of reliable factual information. They cannot flourish if they are beset with troublesome infections of mistaken beliefs. To establish and to sustain an advanced culture, we need to avoid being debilitated either by error or by ignorance.
The above statement is certainly effective in manipulating me to believe in the value of truth. However, it is also sufficiently vague as to make it impossible to test whether what Frankfurt says is true or whether it is false. Certainly the adaptive nature of positive illusions or our work on religion and the social interface theory might hint toward falsehood. But a sufficiently slippery definition of truth can hint truth.
The real issue is that Frankfurt presents a straw-man of people who deflate or question capital-T ‘Truth’ as an organizing principle. The whole point of pragmatic approaches to the question is to eliminate Truth as a category in favour of that with lets us avoid error and provide flourishing. As such, they can agree with Frankfurt’s claim above without attributing it to ‘Truth’. In fact, they might point to very useful and cohesion enhancing beliefs that would not be Truth for Frankfurt.
If we are to think about Truth then I think we need to think about how Truth is used in practice. In the real world.
From my experience, it isn’t static Truth that enables advances or lets us escape error and ignorance. Rather, it is dynamic Investigation. Truth’s job, instead, is to end investigation and inquiry. To say “this case is done, let’s move on”.
Sometimes this is an important thing to do. Not everything needs to be debated. Not everything needs to be investigated. And not everything needs to be questioned. There have to be priorities. And in this regard Truth can be useful.
I think this also lets us better understand bullshit. One of the practical uses of bullshit is usually the same as the practical use of Truth: stop investigation and inquiry. Except whereas in using Truth as our stop requires some due diligence and wondering about if the point in question is a reasonable place to stop. And sometime even gives us a means to potentially resume investigation later. Bullshit lets us avoid this.
But both end investigation.
A tempting dissimilarity between Truth and Bullshit’s relationship to Investigation might be their role in motivating investigation. A common position for Truth, and one that Frankfurt takes throughout, is that a desire for Truth can motivate us to investigate. So from my anti-Frankfurt perspective: even if Truth itself is a — at times desirable and necessary — negative, it’s motivation role is a positive.
But I don’t think this is that different from Bullshit. At least from the garden-hose of misinformation kind of bullshit. From the merchants of doubt kind of bullshit. One of the safety mechanisms built into our notion of Truth is that if we get two conflicting ‘truths’ then we should restart investigation to resolve the contradiction. This is what bullshit can capitalize on if instead of stopping investigation, it wants to start it. By throwing enough disinformation at us, it becomes difficult to know what to believe. This can prompt us to investigate. However, since we are so conditioned on truth and mostly bad at actually carrying out investigations, this often ends up with us just arbitrarily picking the most comfortable — or most repeated or easily accessible — set of propositions as our static set.
In the end, I don’t think the line between Bullshit and Truth is nearly as clear cut as Frankfurt makes it. In particular, if we focus on the uses to which we put both concepts. And without focusing on this practical aspect, I think that Frankfurt fails to engage with the more interesting challenges to capital-T ‘Truth’.
But these are my recollections from a pair of books I read 4 years ago. So I might have forgotten some of the nuance of Frankfurt’s position. I am eager to hear from you, dear reader, on the points that I missed.
]]>This lead to a wide-ranging discussion and clarification of what is meant by terms like mechanism. I had meant to blog about these conversations when they were happening, but the post fell through the cracks and into the long to-write list.
This week, to continue celebrating Rockne et al.’s 2019 Mathematical Oncology Roadmap, I want to revisit this thread.
And not just in cancer. Although my starting example will focus on VEGF and cancer.
I want to focus on a particular point that came up in my discussion with Paul Macklin: what is the difference between coarse-graining and abstraction? In the process, I will argue that if we want to build mechanistic models, we should aim not after explaining new unknown effects but rather focus on effects where we already have great predictive power from simple effective models.
Since Paul and I often have useful disagreements on twitter, hopefully writing about it on TheEGG will also prove useful.
I think that there is a difference in kind between abstraction and coarse-graining.
Both allow for multiple realizability, but in different ways.
Let’s start with an example. Let’s take Paul Macklin’s example of a “sufficiently general but descriptive” Complex Adaptive System (CAS). In such a setting, we might talk about a general “[a]ngiogenesis promoter instead of specifically VEGF-A 165”. Many different specific molecules might act as an angiogenesis promoter and might do this at different strengths, and we are not committing ourselves to any one of them. Thus, we have multiple-realizability. And for Paul, this would be an abstraction.
But I disagree. For me, this is a coarse-graining because the fundamental mechanism is pinned down and assumed. All that is left to vary is a real parameter (or a few) specifying strength, but no real complexity is abstracted over. Replacing VEGF-A 165 by an unspecified angiogenesis promoter does not make our life as modellers significantly simpler. In fact, it might make our life harder since instead of worrying about a single parameter setting that captures VEGF-A 165, we have to worry about a family or range of parameters.
With only coarse-graining available to us, Paul is right that we can’t describe a complex adaptive system without using a complex adaptive system.
Abstraction, however, allows us to define new effective ontologies where the effective objects are not related to their CAS reductive counterparts in simple terms. The act of measurement itself can hide computation; the effective object measured might not be computationally easy to determine from a reductive theory. For me, it is only in such cases where our new objects hide complexity that we can say we’ve abstracted over that complexity.
Let’s return to the VEGF-A 165 example. And here I am — as is often the case — flying by the seat of my pants, since I know nothing about VEGF-A 165 apart from it being a signal protein important for the formation of blood vessels. But this knowledge itself is a kind of abstraction.
Since VEGF-A 165 is a protein, we could study its molecular structure and then do the computational chemistry to simulate it. This would be a lot of work, but it would give us very little use if the relevance of VEGF to our model is only in how it effects the formation of blood vessels. In this case, we might only worry about some higher-level property like binding efficiency or even more operational: translation function from VEGF abundance to rate of blood vessel recruitment. If we can measure these higher-level properties experimentally then we can let nature do the abstracting for us, and just take the relevant output.
This would make our life easier (assuming such a measurement could be done and yielded reliable results).
But it is important in how we interpret this measurement. Because the measurement abstracted not only over the details of the chemistry of VEGF-A but probably countless other factors like the geometry of the space in which the molecule is defusing, the abundance of typically associated molecules, etc. And since — in this hypothetical example — we don’t have a good way to separate this overdetermination, we should not attribute the resultant higher-level property to VEGF-A 165 but to an appropriately defined higher-level object. Much as we wouldn’t assign a temperature to a ‘representative atom’ but instead define it as a higher-level property of ensembles.
Another example of such empirical abstraction that I frequently return to is the idea of effective games. Here, the abstract object (effective game) rolls into its measurement complex and difficult to calculate properties of the reductive object (reductive game, spatial structure, etc). In some ways, we lose this detail, but we gain a theory that is analyzable and understandable. This is what Peter Jeavons and I wrote about for Rockne’s roadmap. From this understanding, we can then roll back our abstraction and measure more specific things about say the spatial structure and thus build a slightly more complex effective theory. We can invert the usual direction of EGT.
But these more complex effective theories will then have a simpler theory to recapitulate at a higher level of abstraction — instead of just having to match some data.
In this way, I propose that we should focus more on working top-down. Building simple (preferably linear) effective theories that are reliable. And only after we are confident in them, should we try to build more reductive theories that when measured in the right way recapitulate the higher-order theories in full.
This is the exact opposite of Paul’s approach. Paul says we should start with a complex reductive model, and if we want a more workable theory then we (quasi-)linearize it.
But I think that we have to start with (quasi-)linear theories and only when they are well coupled to experiment, should we move toward more detail. Only once the simple theory has done all it can for us, should we move to a more reductive one.
I think that we can see successful examples of this throughout the history of science. We knew animal husbandry and selectively improved our crops before we knew about evolution. We could do genetics rather well before we knew about DNA. Or if we want to turn to physics: we could predict eclipses and the positions of planets in the sky before we knew about the inverse-square law of gravitation.
Note that this doesn’t mean that we shouldn’t build progressively more and more reductive theories. It just tells us where we should prioritize. I think that the current urge for many in mathematical oncology is to prioritize building reductive mechanistic theories of effects that aren’t well understood and don’t have any good existing theory to predict them. Instead, we should look at fields where good simple effective theories exist and then aim to build (more) reductive theories that fully recapitulate the effective theory and give us something further: a why.
So let’s return to Noel Aherne’s opening comment: “we have been curing cancers for decades with radiation without a full understanding of all the mechanisms”. Or maybe we can look at his work on how simple linear models of drug-induced cardic toxicity outperform or do nearly as well “as a multi-model approach using three large-scale models consisting of 100s of differential equations combined with machine learning approach”.
That’s great!
This means that we have either implicitly or explicitly, a good effective theory. So this is exactly where we should start building reductive theories to recapitulate our higher-order knowledge. We should expect these reductive theories to be worse at prediction (at least at first), but for that price, we’ll buy some answers to the question: why?
]]>The most popular connection is to view these models as two different extremes in terms of time-scale.
When we are looking at evolution on short time-scales, we are primarily interested which of a limited number of extant variants will take over the population or how they’ll co-exist. We can take the effort to model the interactions of the different types with each other, and we summarize these interactions as games.
But when we zoom out to longer and longer timescales, the importance of these short term dynamics diminish. And we start to worry about how new types arise and take over the population. At this timescale, the details of the type interactions are not as important and we can just focus on the first-order: fitness. What starts to matter is how fitness of nearby mutants compares to each other, so that we can reason about long-term evolutionary trajectories. We summarize this as fitness landscapes.
From this perspective, the fitness landscapes are the more foundational concept. Games are the details that only matter in the short term.
But this isn’t the only perspective we can take. In my recent contribution with Peter Jeavons to Russell Rockne’s 2019 Mathematical Oncology Roadmap, I wanted to sketch a different perspective. In this post I want to sketch this alternative perspective and discuss how ‘game landscapes’ generalize the traditional view of fitness landscapes. In this way, the post can be viewed as my third entry on progressively more general views of fitness landscapes. The previous two were on generalizing the NK-model, and replacing scalar fitness by a probability distribution.
In this post, I will take this exploration of fitness landscapes a little further and finally connect to games. Nothing profound will be said, but maybe it will give another look at a well-known object.
Our contribution to the mathematical oncology roadmap discussed both fitness landscapes and games. In particular, we focused on the need for both theoretical and empirical abstraction in mathematical oncology. I’ve already written about the evolutionary games side of this abstraction.
But I avoided discussing fitness landscapes.
This was mostly because I feel that fitness landscapes have had less impact than games as concrete models in oncology.
Fitness landscapes conceptualize fitness as a single scalar value — a number. A scalar can only express cell-autonomous effects, where fitness is inherent to the properties of a single. But cancer displays important non-cell-autonomous effects that allow fitness to depend on a cell’s micro-environmental context, including the frequency of other cell types. And this is certainly not limited to cancer. Microenvironmental context and the abundance of other organisms almost always matters to the fitness of a particular type.
To accommodate this non-cell-autonomous (or more generally, non-type-autonomous) fitness, EGT views the types as strategies and models fitness as a function which depends on the abundance of strategies in the population.
This is the other way that we can connect fitness landscapes and games:
Fitness landscapes map types to a fitness scalar. Games map types to a fitness function.
Since any scalar can be represented as a constant function, this perspective makes games the more general and foundational perspective. At least in appearance, although often not in practice.
As is often the case, greater expressiveness comes at a price. In the case of the greater expressiveness of games, this price is a loss of analysis techniques. For example, when dealing with fitness landscapes, we can often consider the strong-selection weak-mutation limit. In this regime, we image that mutations are so rare that the population remains monomorphic except for a brief time during a selective sweep. This allows us as modellers to replace a population by a single point in the landscape.
In the case of evolutionary games, such an approximation is unreasonable since it would eliminate the very ecological interactions that EGT aims to study. This means that the strategy space that can be analysed in an evolutionary game is usually much smaller than the genotype/phenotype space considered in a fitness landscape. Typical EGT studies consider just a handful of strategies (most often just two, or three), while fitness landscapes start at dozens of genotypes and go up to tens of thousands (or even hyper-astronomical numbers of genotypes in theoretical work).
But suppose that we did want to work with a huge combinatorially structured space of strategies with frequency dependent fitness. A game landscape. Could we get started?
In the case of scalar fitness, it becomes useful to think about a fitness graph: with each edge between nearby mutants oriented from the lower fitness to the higher fitness type. This results in just two kinds of edges: a direction from one to the other type, or a neutral edge with no direction (for equal fitness). More importantly, these edges have a nice global property: their directed graph is acyclic.
Can we get something with game landscapes? The obvious first step is to limit to linear games and consider what weak mutation dynamics might look like. Now our edge count increases to four: we can still have the two old types plus two new ones. But even with the old directed edge, the acyclic property disappears: just consider the rock-paper-scissors game. The new types get us even more. Let’s look at three and four.
The third edge type arises in games like Stag-Hunt where a repulsive fixed point exists: this results in a fitness graph edge that points against either direction. This stops a population from moving across the edge, even in the random drift limit.
The fourth edge type is the most interesting. It arises in games like Hawk-Dove where an attractive fixed point exists. This results in a fitness graph edge that points inwards. It moves the population to a fixed point where both types co-exist and thus even in the strong-selection weak-mutation limit creates a polymorphic population. This can be difficult to deal with.
But the most non-obvious part of game landscapes is how to represent them. Traditional fitness landscapes are exponentially large objects, so we don’t simply store a long list of scalar fitness values. Instead, we use a compact representation like the NK-model that tells us how to quickly compute a fitness from a genotype. We would need something similar with game landscapes: a mapping from genotype to fitness function. Here two new difficulties arise. First, we need to output a function, not a scalar, so doing something simple like adding together fitness components (as the NK-model does) doesn’t work anumore. Second, what is the domain of the resulting fitness function? The naive answer is that it is all the genotypes: so an exponential domain. This means that our compact mapping from genotype to fitness function has to output a compact mapping of its own (and not just an array of linear coefficients for each potential interaction partner)!
Do you have any suggestions on how to handle these difficulties, dear reader?
I think that these challenges of game landscapes can be addressed in interesting ways. In ways that differ from the stand use of fitness landscapes or the replicator-mutator equation. But I’ll tackle thoughts on this in a future post.
]]>But I don’t think that microscopic systems are the funnest place to see this interplay. The funnest place to see this is in psychology.
In the context of psychology, you can add an extra philosophical twist. Instead of differentiating between reductive and effective theories; a more drastic difference can be drawn between the scientific and manifest image of reality.
In this post, I want to briefly talk about how our modern theories of colour vision developed. This is a nice example of good effective theory leading before any reductive basis. And with that background in mind, I want to ask the question: are colours real? Maybe this will let me connect to some of my old work on interface theories of perception (see Kaznatcheev, Montrey, and Shultz, 2014).
As somebody with some physics background, when I think of the founding of modern psychology, I usually don’t think of Sigmund Freud, but Hermann von Helmholtz instead. To a modern reader, physics, physiology, and psychology might seem like distant fields, but in Helmholtz time these fields were just starting to disassociate themselves and many critical questions of natural philosophy lay at their intersection.
Toward the end of the 1800s, the development and refined understanding of Maxwell’s electromagnetism was taking the scientific image of light and vision further and further from the common sense manifest image of colour and perception. The natural philosophers of the day (the more accurate epithet, given that the title of ‘scientist’ was still struggling for acceptance at the time) could calculate more and more exciting physical properties of light, but could say very little about our subjective experience of light. Helmholtz wanted to reconcile these views by focusing on the physiology and psychology of vision.
In 1850, building on the ideas of Thomas Young, Helmholtz developed the phenomonological trichromatic theory of colour perception. We still use this theory today, although — after Svaetichin’s (1956) work with fish and Dartnall et al.’s (1983) work with humans — we can now rely not just on self-reported perceptions but also know the cellular basis for the photosensitivity: our rods and cones. It is our three types of cones that are responsible for implementing the three dimensions of our colour vision that Helmoltz identified. So, in some sense, Svaetichin (1956) and Dartall et al. (1983) provide the details of the transformation that maps the reductive world of wavelengths to the effective world of colour.
The knowledge of rods and the three types of cone cells is definitely important, but in many ways this physical basis is less important than the psychophysical tradition that Helmholtz (along with Weber, Fechner, and others) established. They build an objective science on the foundation of subjective experience, and they were able to predict the basic structure of the relevant physical basis, without having the ability to study it explicitly. In the context of biology, this is kind of how Schrodinger described many of the properties that DNA would need to have, over a decade before the structure of DNA was uncovered.
It is also important to realize that our increases in understanding of the biological basis of basic processes does not reduce the need for phenomenological approaches that take self reporting as serious evidence. It just moves the approach to more interesting questions, where knowing the basic building blocks is not enough given our lack of knowledge about structure and information processing. My favourite example of this is the interaction of language and colour perception.
So given that we’ve found the transformation that maps our effective theory (psychophysics) onto our reductive theory (physics) does that mean that we can confidently and affirmatively answer the question: Are colours real? Or Nemu Rozario’s more precise statement on the CogSci StackExchange: is there any experiment which can tell us the color we see is the only and real color which is reflected by the object?
The below considerations are based on my old 2013 answer to the question: are colours real?
Usually, for something to be ‘real’, we want it in some reasonable manner to be objective or (because that is extremely vague) at least very consistent across subjective observers. Unfortunately, colour does not satisfy this.
Let’s go through the levels of description:
All of these considerations can be especially useful when thinking about the classic inverted-spectrum problem. It is most popularly stated as: What if someone perceives a color as ‘red’ when it is actually ‘green’?
So what does this mean for the reality of colour? On the one hand, we can clearly say that colours aren’t real. But on the other hand, this feels clearly absurd. Like we missed the point of the question. This is the tension between the scientific and manifest image of reality. In the scientific image of reality, colours are not real. In the manifest image, they are. And contrary to their names, we see from Helmholtz that we can do science at either level.
But the mapping between these images in extremely complicated and variable. A grander version of the complicated mappings between the reductive and effective theories that I usually study.
Dartnall, H. J. A., Bowmaker, J. K., & Mollon, J. D. (1983). Human visual pigments: microspectrophotometric results from the eyes of seven persons. Proceedings of the Royal Society B, 220(1218): 115-130.
Gilbert, A. L., Regier, T., Kay, P., & Ivry, R. B. (2006). Whorf hypothesis is supported in the right visual field but not the left. Proceedings of the National Academy of Sciences of the United States of America, 103:(2), 489–494.
Hoffman, D.D. (1998). Visual intelligence: How we create what we see. W.W. Norton, New York.
Hoffman, D.D. (2009). The interface theory of perception. In: Dickinson, S., Tarr, M., Leonardis, A., & Schiele, B. (Eds.), Object categorization: Computer and human vision perspectives. Cambridge University Press, Cambridge.
Kaznatcheev, A. (2017). Two conceptions of evolutionary games: reductive vs effective. bioRxiv: 231993.
Kaznatcheev, A. (2018). Effective games and the confusion over spatial structure. Proceedings of the National Academy of Sciences: 115(8): E1709.
Kaznatcheev, A., Montrey, M., & Shultz, T.R. (2014). Evolving useful delusions: Subjectively rational selfishness leads to objectively irrational cooperation. Proceedings of the 36th Annual Conference of the Cognitive Science Society. arXiv: 1405.0041v1.
Regier, T., & Kay, P. (2009). Language, thought, and color: Whorf was half right. Trends in Cognitive Sciences, 13L(10), 439–446.
Svaetichin, G. (1956). Spectral response curves from single cones, Actaphysiol. Scand. 39, Suppl. 134, 17-46.
]]>Recently, I was reading yet another preprint that has observed non-cell autonomous fitness in tumours. In this case, Johnson et al. (2019) spotted the Allee effect in the growth kinetics of cancer cells even at extremely low densities (seeding in vitro at <200 cells in a 1 mm^3 well). This is an interesting paper, and although not explicitly game-theoretic in its approach, I think it is worth reading for evolutionary game theorists.
Johnson et al.'s (2019) approach is not explicitly game-theoretic because they consider their in vitro populations as a monomorphic clonal line, and thus don't model interactions between types. Instead, they attribute non-cell autonomous processes to density dependence of the single type on itself. In this setting, they reasonably define the cell-autonomous null-model as constant exponential growth, i.e. for some constant fitness and total tumour size .
It might also be tempting to use the same model to capture cell-autonomous growth in game-theoretic models. But this would be mistaken. For this is only effectively cell-autonomous at the level of the whole tumour, but could hide non-cell-autonomous fitness at the level of the different types that make up the tumour. This apparent cell-autonomous total growth will happen whenever the type interactions are described by constant-sum games.
Given the importance of constant-sum games (more famously known as zero-sum games) to the classical game theory literature, I thought that I would write a quick introductory post about this correspondence between non-cell autonomous constant-sum games and effectively cell-autonomous growth at the level of the whole tumour.
Let’s start with some definitions.
In a two-player game in classic game theory, if player one plays pure strategy i and player two plays pure strategy j then they receive a payoff for player one and for player two. We might summarize this by writing . In general, these two matrices G and H are arbitrary.
We call a game zero-sum if the gain of any one player are always balanced by the losses of the other. Mathematically, we say that G,H is zero-sum if G = -H, or equivalently G + H = 0.
From this, it isn’t hard to imagine what constant-sum games mean in classic game theory. It is when G + H = K where K is a matrix that has the same constant k in every entry. We might be even more specific, and say that this game is a k-sum game. Classic game theorists don’t usually distinguish between zero-sum and k-sum games because constant off-sets don’t matter, we could just as easily have given each player a base payoff of k/2 and then played a zero-sum game.
What happens when we move to an evolutionary game theory setting?
If we’re studying a single population of several types then players 1 and 2 get equated. In other words, we demand that our game be symmetric: (i.e. H is the transpose of G). Since the two matrices are linked in this way, we usually don’t specify H (since it is not redundant) and just make our mapping . If we want this game to be k-sum then our some restriction for symmetric games becomes: . In particular, this implies that each diagonal entry is equal to k/2.
This means that for a two-strategy k-sum symmetric game, there is only 1 parameter a:
I’ll use the two-strategy case as an example for simplicity. But it isn’t difficult to generalize the main ideas to more strategies.
Since I’m interested in hiding games within constant exponential growth, we need to start with an exponential growth model for the two strategies:
where our growth rates come from the game as:
and where are the proportions of the two strategies with as the total tumour size.
From here, it is straightforward to write down the equation for the tumour growth rate:
where is the average fitness of the population. And if we let then we can also write down the complete system dynamics as:
Note that this decouples the proportion dynamics from the population growth. This is why it can be perfectly reasonable to use replicator dynamics in growing populations (at least if you have a reason to believe that are functions of proportions and not densities.
However, in the above equations, the population growth can still depend on the proportion dynamics. This is due to the proportions appearing within . But if we’re dealing with constant-sum games then this dependence will disappear, as will be a constant.
There are several ways to see that will be constant for constant-sum games. The easiest is by noting that the average is just a normalized sum of all the interactions and in each interaction the two payoffs sum to k and thus the average will be k regardless of which pairs meet.
But if we insist on seeing the arithmetic then we get:
At this point, we haven’t used that G is constant-sum. We’ve also written the equations in such a form that if we replaced the summation to be over a set of n strategies instead of just the two in {A,B} then nothing would change.
Let’s now use that G is constant-sum: i.e. that for all i,j. Then we can continue with:
From this, our complete dynamic system (for the two strategy case) simplifies to:
With the dynamics for the total tumour size and the dynamics for the frequencies as completely decoupled. More improtantly, if we were looking at just the total tumour burden then it would look like it has cell-autonomous growth even though there is non-cell-autonomous growth at the level of the subpopulations.
What does this mean? Not much. Especially for Johnson et al. (2019), my observation is irrelevant since they already rule out the constant growth null-model. If one was to level objections about their definitions, it would be about if simply the use of an abiotic resource should count as non-cell-autonomous or not. But I’ve already engaged with this point when I talked about EGT without interactions.
Mostly, I wanted to share this as another cute observation to add to the list of fun discrepancies possible between the reductive and effective views of population dynamics (for more on that, see Kaznatcheev, 2017).
Johnson, K. E., Howard, G., Mo, W., Strasser, M. K., Lima, E. A., Huang, S., & Brock, A. (2019). Cancer cell population growth kinetics at low densities deviate from the exponential growth model and suggest an Allee effect. BioRxiv, 585216.
Kaznatcheev, A. (2017). Two conceptions of evolutionary games: reductive vs effective. BioRxiv, 231993.
]]>The economist turns to her without looking down and replies: “Don’t be silly, that’s impossible. If there was a $20 bill there then it would have been picked up already.”
This is the fallacy of jumping to fixed-points.
In this post I want to discuss both the importance and power of local maxima, and the dangers of simply assuming that our system is at a local maximum.
So before we dismiss the economist’s remark with laughter, let’s look at a more convincing discussion of local maxima that falls prey to the same fallacy. I’ll pick on one of my favourite YouTubers, THUNK:
In his video, THUNK discusses a wide range of local maxima and contrasts them with the intended global maximum (or more desired local maxima). He first considers a Roomba vacuum cleaner that is trying to maximize the area that it cleans but gets stuck in the local maximum of his chair’s legs. And then he goes on to discuss similar cases in physics, chemisty, evolution, psychology, and culture.
It is a wonderful set of examples and a nice illustration of the power of fixed-points.
But given that I write so much about algorithmic biology, let’s focus on his discussion of evolution. THUNK describes evolution as follows:
Evolution is a sort of hill-climbing algorithm. One that has identified local maxima of survival and replication.
This is a common characterization of evolution. And it seems much less silly than the economist passing up $20. But it is still an example of the fallacy of jumping to fixed-points.
My goal in this post is to convince you that THUNK describing evolution and the economist passing up $20 are actually using the same kind of argument. Sometimes this is a very useful argument, but sometimes it is just a starting point that without further elaboration becomes a fallacy.
Let’s start with a discussion of the fallacy itself.
As with many fallacies, the fallacy of jumping to fixed-points starts from a good place. It starts from taking a good idea but using it uncritically. That good idea is fixed-point theorems.
Fixed-point theory is some of the most useful theory in mathematics. The typical result has the form of: take some function F and if it has certain general properties then there exists at least one point x such that F(x) = x. The most famous example of this is Brouwer’s fixed-point theorem. In the case of Broewer’s theorem, the conditions on F are that it is continuous function and that it maps a compact convex set to itself/ If those conditions are met then there exists an x in the domain of F such that F(x) = x.
For a simple consequence of this theorem. Consider any accurate map of a city. If you lay it on a table somewhere in that city then there will be a point on that map that corresponds exactly to “You are Here”. In other words, the point in the real world will coincide with the physical location of it’s representational point on the map.
An obvious but fun fact to add to our repertoire of knowledge on maps and models.
Of course, just because that point exists, doesn’t mean that you will actually know which point it is. Otherwise, you’d never need a GPS.
The fallacy comes from taking this point for granted. Taking its existence as indicating that we can use it as our starting point in further reasoning.
A less trivial and much more interesting consequence of Brouwer’s fixed-point theorem — actually, it is usually prove from the more powerful (but less famous) Kakutani fixed-point theorem — is Nash’s result about the existence of best-response equilibria (i.e. Nash equilibrium) in games with a finite set of pure strategies.
But as the active implications of ‘best-response’ and ‘equilibrium’ suggest, this will make more sense if we focus on dynamic systems.
Often, F is not just some function but a dynamic process that maps a domain to itself. It is an update rule, or time-step, or generator of motion. In that case, the fixed-point is a point that remains at equilibrium and does not change through time.
The simple example of Brouwer’s fixed-point theorem for dynamic system is stirring a cup of coffee. As you stir in your lump of sugar, there will always be one point on the surface of the liquid that isn’t rotating.
Now, this doesn’t mean that the dynamics will ‘lead to’ this point. In particular, you could, for example, have closed orbits or cycles around the fixed points. On these cycles, points will go around in a circle due to the application of F and not reach the fixed-point.
In the case of game theory: think about Rock-Paper-Scissors. There is a fixed-point or Nash-equilibrium strategy: the mixed strategy that says to play each of rock, paper, scissors with probability 1/3. But there are also lots of cycles where rock gets replaced by paper which gets replaced by scissors which gets replaced by rock.
As such, we often care about particular kinds of dynamics F where we can guarantee that cycles don’t occur. In these cases, it can be helpful to rename our fixed-point from equilibrium to local maximum.
A classic example of this would be adaptive strong-selection weak-mutation dynamics on a finite static-fitness landscape. In this case, we can — at most times — imagine our population as monomorphic and thus representable by a single point in genotype space. And evolution is the update rule that at each ‘step’ it picks a genotype y of higher fitness adjacent to our the genotype x corresponding to our population and moves our population there. This dynamic will have fixed-points at ‘local peaks’ of the fitness landscape: where no adjacent mutation is adaptive.
This was what THUNK was referencing with his description of evolution.
When I think of economics 101, I think of supply and demand. Supply and demand is a concept that is so embedded in the popular psychie that it can even be a good anchor for metaphors about evolution.
Let me briefly remind you of supply and demand.
The popular conception of it is as a two dimensional graph with the quantity of the product as the x-axis and the price as the y-axis. The demand curve then proceeds from the top left of the graph towards the bottom right and represent how much product customers are willing to purchase at a given price. The supply curve proceeds from the bottom left to the top right and represents how much product can be produces for a given cost. Where these two curves intersect, we have a fixed-point at which the market has ‘solved’ the problem of supply and demand.
But economics doesn’t stop at 101. And considers markets with much more dimensions. In this case, economists often turn to the Arrow-Debreu model which generalizes the above intuitions to the case of a large number n of commodities with various potentially co-varying supply and demand functions.
Unsurprisingly — given the context of this blog post — the Arrow-Debreu model has a fixed-point theorem. Proving it is usually an application of Nash-equilibrium. The fixed-points of the Arrow-Debrue model are competitive (or Walrasian) equilibria and correspond to a set of prices such that aggregate supplies will equal aggregate demands for every commodity in the economy.
Knowing that such a pricing exists is very interesting, and a wonderful application of fixed-point theorem.
Believing that such an equilibrium — or something similar — is achieved in practice — at least when thinking about financial products — is the efficient-market hypothesis. It is usually associated with the Chicago school of economics. In such an equilibrium, there is no reliable opportunity for arbitrary and an investor cannot consistently outperform the market.
That is why the economist doesn’t believe that a $20 is just laying there on the ground.
There are many possible and reasonable criticisms of efficient-market hypothesis. But given my interest in theoretical computer science, I want to offer the algorithmic critique.
Around 1994, Christos Papadimitriou introduced the PPAD complexity class. PPAD stands for Polynomial Parity Arguments on Directed graphs. He used this to argue that the existence proofs of certain fixed-point theorems like Brouwer’s cannot be transformed into efficient algorithms for finding the guaranteed fixed-points. Since then, we have come to believe that PPAD-complete problems are not solvable in polynomial time — although as with most cstheory conjectures (like P != NP), this remains open.
By 2005, Chen & Deng had proved that finding Nash equilibrium is PPAD-complete. And in the years that followed, this was transformed into results that it is a PPAD-complete problem to find a competitive equilibrium in the Arrow-Debreu model (even with relatively simple preference functions; Chen et al., 2009), or to even approximate it (Deng & Du, 2008). Thus, it is reasonable to believe that a competitive equilibrium cannot — in general — be found in polynomial time. And since a market has a polynomial number of agents and each has polynomial bounded computational power, this means that the market itself cannot — in general — find a competitive equilibrium.
This means that there can always be relatively easy to spot but constantly shifting opportunities for arbitrage. And you might in fact see $20 just laying on the ground.
When Papadimitriou was defining PPAD, he was actually building on previous work of his that defined a similar complexity class: Polynomial Local Search (PLS; Johnson, Papadimitriou & Yannakakis, 1988). This class corresponds roughly to the difficulty of finding local maxima. Recently, I’ve used this to argue that on hard fitness landscapes evolution — no matter what dynamic it follows — won’t be able to find even a local fitness peak in polynomial time (Kaznatcheev, 2013; 2019). In other words, on hard fitness landscapes, an evolving population will be in perpetual maladaptive disequilibrium. There will always be nearby mutants that have higher fitness. For a more careful discussion of both PLS and my results, see my old post on the computational complexity of evolutionary equilibria.
But let us return to THUNK’s description of evolution.
When he says that evolution has identified a local maximum, he is saying that there isn’t a small mutation that will improve fitness. In the analogy to the economist, he is saying that there cannot be $20 on the ground. Now, it might certainly be the case that there are no adaptive mutations available, but this is not something that we should simply take as following from the definitions of evolution. That would be the fallacy of jumping to fixed-points. Instead, we should try to look at particular populations on particular fitness landscapes and see if any nearby advantageous mutations are available. This is the same thing as the economist not assuming the efficient market hypothesis but instead looking down on the ground and checking if the $20 is there or not.
After all, a free lunch is on the line.
Chen, X., Dai, D., Du, Y., & Teng, S. H. (2009). Settling the complexity of Arrow-Debreu equilibria in markets with additively separable utilities. 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS): pp. 273-282.
Chen, X., & Deng, X. (2006). Settling the complexity of two-player Nash equilibrium. 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS): pp. 261-272.
Deng, X., & Du, Y. (2008). The computation of approximate competitive equilibrium is PPAD-hard. Information Processing Letters, 108(6), 369-373.
Johnson, D. S., Papadimitriou, C. H., & Yannakakis, M. (1988). How easy is local search?. Journal of computer and system sciences, 37(1), 79-100.
Kaznatcheev, A. (2013). Complexity of evolutionary equilibria in static fitness landscapes. arXiv preprint: 1308.5094.
Kaznatcheev, A. (2019). Computational complexity as an ultimate constraint on evolution. Genetics: genetics.119.302000.
Papadimitriou, C. H. (1994). On the complexity of the parity argument and other inefficient proofs of existence. Journal of Computer and System Sciences, 48(3): 498-532.
]]>But this is not the algorithmic lens.
In this post, I will try to give a very brief description (or maybe just a set of pointers) for the algorithmic lens. And of what we should imagine when we see an ‘algorithmic X’ subfield of some field X.
The algorithmic lens is not about computers or computer programs. In the same way that astronomy is not about telescopes and that thermodynamics is not about steam engines. Rather, the algorithmic lens recognizes that our theories, models & hypotheses are algorithms in their own right. Thus, we can use the conceptual tools built by theoretical computer scientists for analyzing and designing algorithms to evaluate and refine our scientific theories, models, and hypotheses.
In other words, whereas ‘computational X’ is a practical branch of X, ‘algorithmic X’ is a theoretical branch of X. ‘Algorithmic X’ is a suite of mathematical techniques taken from theoretical computer science and applied to the conceptual objects of X. A paper in ‘computational X’ might feature simulations, data crunching, and computer programs as central characters. A paper in ‘algorithmic X’ is more likely to include theorems, lemmas, proofs and conceptual analysis. As such, one can often view ‘algorithmic X’ as a branch of pure math that is focused on the computational aspects of X.
Unfortunately, a lot of the mathematics used by theoretical computer scientists is foreign to many classic scientists. Thus, there are much fewer fields with an ‘algorithmic X’ that there are with a ‘computational X’. And although many fields have ‘mathematical X’ branches, these are also slightly different from ‘algorithmic X’ — since these fields often draw on applied math and most often focus on differential equations and statistics. But so far ‘algorithmic X’ has integrated most deeply into fields that are deeply mathematical. Here it often compliments the corresponding mathematical and theoretical branches. The two most prominent examples are in physics and economics.
In physics, the relevant subfield is quantum information processing (QIP). In economics it is algorithmic game theory. These are both fascinating areas of research. And I have a little bit of formal experience with the former. In fact, back in 2010, when I was finishing undergrad, I first concentrated on QIP and worked in it for a number of years.
But over time I realized that the algorithmic foundations of QIP are pretty secure and in great hands. And I wanted to try after a field where they were less clearly developed. That is why I shifted to evolution. So I do much of my current work in the up and coming subfield of algorithmic biology.
I’ve already written in more detail about what theoretical computer science can offer biology in a prior post. For a more concrete example, see my post on theoretical versus empirical abstraction.
An important final point, is that the interaction of theoretical computer science with the natural sciences is not one sided. Although I am skeptical of claims about ‘learning’ algorithms from nature, I am less skeptical about foundational questions about nature as motivators for studying and developing new kinds of theoretical computer science. Thus, computer science has much to gain from going beyond questions related to technology. Computer science has much to gain by expanding its domain of inquiry to be the whole of the natural world.
]]>1. Somebody makes up a factoid and writes it somewhere without citation.
2. Another person then uses the factoid in passing in a more authoritative work, maybe sighting the point in 1 or not.
3. Further work inherits the citation from 2, without verifying its source, further enhancing the legitimacy of the factoid.
4. The cycle repeats.
Soon, everybody knows this factoid and yet there is no ground truth to back it up. I’m sure we can all think of some popular examples. Social media certainly seems to make this sort of loop easier.
We see this occasionally in science, too. Back in 2012, Daniel Lemire provided a nice example of this with algorithms research. But usually with science factoids, it eventually gets debuked with new experiments or proofs. Mostly because it can be professionally rewarding to show that a commonly assumed factoid is actually false.
But there is a similar effect in science that seems to me even more common, and much harder to correct: motivatiogenesis.
Motivatiogenesis can be especially easy to fall into with interdisiplinary work. Especially if we don’t challenge ourselves to produce work that is an advance in both (and not just one) of the fields we’re bridging.
Let me first be a bit more precise about what I mean by motivatiogenesis.
I think that Kyler J. Brown first introduced me to this many years ago when we were both still at McGill. He referenced me an all-too-common misbehavior in neuroscience: A bad researcher justifies his work to a biologist with “this is the way computer scientists address this question, and it is of interest purely from theory” and when justifying the same work to a computer scientist — “this model is biologically reasonable, and of interest from science”. The biologist and computer scientist don’t know the other’s field well enough to see through this bluff, and the researcher manages to squeeze a poor publication out.
Contrast this with what a good researcher would do. She would justify her work to a biologist purely on biological grounds. And she would justify it to the computer scientists through its contribution to computer science. In other words, a good interdisciplinary scientist would contribute to both fields, and not use the border as a crutch.
But the case of the bad scientist gets worse!
Once the cycle starts and a group get a few such papers out, the auto-catalytic effect sets in: future work can justify itself by saying “we use a standard model in the field”. All of this even though the ‘standard model’ never had a justification for it. Eventually the subfield can start generating and answering its own field-endogenous questions that are fundamentally unhinged from reality.
But unlike a factoid, a false motivation is harder to burst. Especially if a subfield or cottage industry develops around the method. I think that this might be closely related to Jeremy Fox’s notion of Zombie Ideas.
Sometimes, you are able to see this motivatiogenesis cycle starting or well developed, but there is nothing you can do! It is because you have to publish an unimpressive negative result or critique of motivations. This is much less rewarding that upending a commonly believed fact. And much harder than continuing to work in the bubble.
I feel like a lot of potential bubbles (whether they be hot-topics or temperate on-going themes) started via motivatiogenesis. The only way to address the bubble is by having people who are knowledgeable on both sides of the topic point out why a certain model or approach has neither a strong empirical, nor theoretical justification. It started by chance and continues because of sociological reasons. It can help to have skeptical but engaged colleagues from different fields.
But this is hard to do, and so a motivation bubble will often grow and grow.
Sometimes, new authors don’t even realize they’ve fallen into a trap. If they’ve been trained within the bubble, it might be impossible to find the appropriate distance for questioning. When reflection on my own work, I sometimes fear that parts of evolutionary game theory might end up like this.
But even with bubbles, there can be hope. A field is not static, and there are ways to ground it and use the tools developed even if they were developed in an ungrounded way. This is why I am such a big advocate of operationalization of theoretically well-understood simple models.
Do you know any interdisiplinary motivation bubbles in your own fields, dear reader? Have you been part of a motivatiogenesis before?
I feel like I’ve definitely been a small part of motivatiogenesis — a part of this cycle.
For one of my papers in undergrad, I briefly studied pronoun acquisition in children (Kaznatcheev, 2010). It was a computational study and I used Fahlman & Lebiere’s (1990) cascade-correlation neural nets. I used these because it was the ‘standard approach’ in my lab and I had code on hand for running them. However, these types of neural nets are not the best performers from an engineering perspective, and also seem to have no real empirical justification from neurobiology. But I used them because I simply didn’t have the time or energy to consider building (or finding) a better model. Or the wisdom and perspective to question my model choice. So I used the ‘standard model’.
To make matters worse, the paper has subsequently been cited in engineering work as motivation: “people use CC NNs in science for these class of problems (Kaznatcheev, 2010), we should continue to work on them”.
Thankfully, the overall impact of this old paper of mine has been minimal.
I don’t know how to avoid these bubbles forming. But I really liked John Regehr’s call for epipaths for bubbles. Maybe we can learn something useful once they’ve popped?
What do you think, dear reader? Is this a legitimate concern or an unnecessary worry?
Fahlman, S. E., & Lebiere, C. (1990). The cascade-correlation learning architecture. In Advances in neural information processing systems (pp. 524-532).
Kaznatcheev, A. (2010). A connectionist study on the interplay of nouns and pronouns in personal pronoun acquisition. Cognitive Computation, 2(4), 280-284.
]]>