If you prefer my completely raw, unedited impressions in a series of chronological tweets, then you can look at the threads for the three days: Wednesday (14 tweets), Thursday (15 tweets), and Friday (31 tweets).
As before, it is important to note that this is the workshop through my eyes. So this retelling is subject to the limits of my understanding, notes, and recollection. This is especially distorting for this final day given the large number of 10 minute talks.
The best place to start is — I think — at the middle. A little bit after lunch, Nima Dehghani outlined eight computational problems faced by the cell: switches, reconfiguration, noise, response, history, tuning, time, & distribution. Many of the talks focused on how cells approach these challenges. In particular, Zhiyue Lu focused on the challenges of switches, tuning, time, and distributions in the circadian clock of synechococcus elongatus. He showed how this cell could implement its clock by switching between two limit cycles in the space of chemical concentrations. But as Lu reminded us: “a limit cycle is not a computer, it needs to be tuned”. To have computation, a system must be able to respond to its’ environment. In this case, by synchronizing properly with the time its trying to keep.
But an important part of the response of Lu’s cell to their environment is that he could optimize how the cell sets its clock for various noise distributions. From the engineering perspective, this is the programmability of the cell or chemical system. Swapnil Bhatia expanded extensively on this in his talk on the importance of tinkering with biology. He shared three experiences with us: (1) how to implement arbitrary binary decision diagrams in DNA via cutting & stitching mechanics; (2) how to built a translator from verilog to DNA and just how difficult biology is to debug; and (3) how to port enzyme production module from some animal to E. Coli (much more difficult than porting from Mac to Windows but still doable). Kobi Benenson pushed this programming of the cell into practical, medical applications. In particular, he shared his efforts to program gene therapies to better target cancer.
Finally, Eric Winfree asked about the design space of possible chemical reaction networks and its limits. And Stefan Bolmholdt suggested that we should look for inspiration in the circuit boards of washing machines and the advice of old Bell Labs telephone repair manuals if we want to find the design principles of gene-regulatory networks. Albert Kao added a further design consideration: since we know that a given natural circuit evolved, how does that help us understand it better?
For Chris Kempes, Kao’s consideration leads us to identify physical constraints on the morphology of organisms. For Artem Kaznatcheev, this meant identifying the computational constraints on evolution itself. This constraint reminds us to not let perfect be the enemy of good, and to move away from optimizing to satisficing. This is especially important if we want to answer Melanie Moses‘ provocative question: how much does it cost to copy an apple? After all, we need to account not just for the seed and the tree, but the birds and the bees, and the whole of evolutionary history.
So to answer Moses’ question, we have to move away from thinking of computation as restricted to a single cell or organism. Instead, we need to arrive at collective computation. This was the focus for most of the other Friday talks. And I will discuss this more next week.
]]>Since my recent trip to the Santa Fe Institute for the “What is biological computation?” workshop (11 – 13 September 2019) brought me full circle in thinking about algorithmic biology, I thought I’d rekindle the habit of post-workshop blogging. During this SFI workshop — unlike the 2013 workshop in Princeton — I was live tweeting. So if you prefer my completely raw, unedited impressions in tweet form then you can take a look at those threads for Wednesday (14 tweets), Thursday (15 tweets), and Friday (31 tweets). Last week, I wrote about the first day (Wednesday): Elements of biological computation & stochastic thermodynamics of life.
This week, I want to go through the shorter second day and the presentations by Luca Cardelli, Stephanie Forrest, and Lulu Qian.
As before, it is also important to note that this is the workshop through my eyes. So this retelling is subject to the limits of my understanding, notes, and recollection. And as I procrastinate more and more on writing up the story, that recollection becomes less and less accurate.
Thursday started with Luca Cardelli giving a tutorial on reactive systems — a response to Turing machines “shutting out the world”. For me, this echoed a bit the second day of the 2013 Natural Algorithms and the Sciences workshop which also opened with a presentation by Cardelli. I was very excited about his work then and still am today. But it was also surprising that even though we are in the same department, we had met in person until this trip to Santa Fe.
The goal of Cardelli’s tutorial was to introduce us to a new way to thinking about computation. A way of thinking about computation that is better suited for distributed systems interacting with their environment. He started by reminding us of classic circuit diagrams for functions and showing how they are transformed when functions are replaced by processes. Unfortunately, unlike functions, processes cannot always be shown by static circuit diagrams. In such cases, we need to replace the diagrams by a symbolic calculus like pi-calculus. This is especially important for thinking about dynamic processes like proliferation and degradation that are central to biology, as well as for considering dynamic connectivity between processes.
Unfortunately, Cardelli didn’t have time to highlight the biological successes of this approach. But my personal favorite example is when Cardelli & Csikász-Nagy (2012) showed that the cell cycle switch robustly implements the Angluin, Aspnes, and Eisenstat (2008) approximate majority algorithm from distributed computing. For me, this is one of the clearest cases of biological computation within the cell.
After, Stephanie Forrest flipped the direction. Instead of talking about computing in biology, she focused on the biology of computing. She discussed both the science and engineering. On the engineering side, she was interested in bio-inspired computing; and on the science side, she was interested in the biological properties (or bio-like properties) of computing. This is unified for her by 5 common principles: (1) modularity and redundancy; (2) learning, memory, and communication; (3) diversity; (4) evolution as a design process; and (5) emergence of bad actors. In particular, she gave two examples from her own work: what we can learn from the immune system for building secure computer systems, and how we can use evolutionary biology to understand the development of software.
From Forrest’s first example, I was particularly sympathetic to the importance of diversity for security. This is something I’ve also written about before, but instead of the human immune system, I used the vulnerability of monocultures in human agricultural practice as an analogy for software security failures. In both cases, it is important to focus on the role of diversity, redundancy, modularity, and adaptability in resisting emergent bad actors. In other words, all of Forrest’s common principles are essential.
Her second example was to look at software from the lens of evolutionary biology. Although her presentation from this workshop is not available online (as far as I know), she has given a longer presentation of this work as part of the Stanislaw Ulam Memorial Lecture Series. I recommend watching it:
On the engineering side, Forrest briefly touched on the GenProg project (Le Goues et al., 2012a,b) for evolving software bug fixes. And on the science side, she touched on why this evolutionary approach to bug fixing works: the software we use emerged by an evolutionary process and thus behaves like a biological system (Schulte et al., 2013). In particular, Forrest claimed that the mutational robustness, epistasis, fitness distributions, and neutral networks of software resemble their biological counterparts. In other words, Forrest was inspired by biology to design her repair tools, but when she started analyzing why these tools worked, she realized it was because the software systems she was repairing looked very biological.
This prompted the audience at the workshop to ask: So is software alive?
Forrest sidestepped the question well: “I don’t think that biologists know what life is any more than computer scientists known what computation is.”
But I would be interested in a much more concrete biological question. Given how computational complexity can constrain biological evolution on some natural fitness landscapes (Kaznatcheev, 2019), what are the special features of the software fitness landscape that make it so friendly to adaptation? Why are software landscapes easy?
The third, and final, presentation of the day was Lulu Qian on programming molecular machines. This was a technical talk on how to build computers from reactions among DNA in a test tube. The goal with this line of research is to take a simple but universal component like the non-linear gates of a neural network and show how to implement them in DNA. After one component can be created reliably, the hope is that the components can be stacked together to allow for larger and longer computations. The reason for picking DNA is that it is relatively well understood and reliable, yet able to interface or control any biological machine that we might want to place it within. The aim isn’t (necessarily) to get the DNA doing bigger or faster computations than our silicone based machines, but to do universal, well controlled, and programmable computations in a way that can be directly interfaced with existing biology.
When Qian started this research in 2011, her team could implement a neural network with just 4 weights (Qian, Winfree & Bruck, 2011). But since then, she’s been able to scale it up. Partially from better biology and partially from better computer science. On the CS front, for example, a big break came when she replaced the more standard activation functions she was using in 2011 by the winner-take-all activation function that is still universal and programmable but much more compatible with the natural tendencies of DNA. So 7 years late, Cherry & Qian (2018) could build DNA reaction networks that could properly identify 9 hand-written digits from the MNIST database.
Cherry & Qian (2018) implemented a pretrained network. And that is all that is essential for building reliable computational blocks. But now, with Tianqi Song, they are also working on implementing learning within the DNA computer itself. This seems very exciting.
But most exciting for me, was when Qian returned to the general theme of the workshop at the end of her talk. Here, she gave us a more precise question. She replaced “What is biological computation?” by “How could computer science provide insights for better understanding biological systems and for creating synthetic molecular systems with life-like behavior?”
This was a nice unification of science and engineering to take us into the final day.
Angluin, D., Aspnes, J., & Eisenstat, D. (2008). A simple population protocol for fast robust approximate majority. Distributed Computing, 21(2), 87-102.
Cardelli L, & Csikász-Nagy A (2012). The cell cycle switch computes approximate majority. Scientific Reports, 2
Cherry, K. M., & Qian, L. (2018). Scaling up molecular pattern recognition with DNA-based winner-take-all neural networks. Nature, 559(7714), 370.
Kaznatcheev, A. (2019). Computational complexity as an ultimate constraint on evolution. Genetics, 212(1), 245-265.
Le Goues, C., Nguyen, T., Forrest, S., & Weimer, W. (2012a). GenProg: A generic method for automatic software repair. Software Engineering, IEEE Transactions on, 38(1), 54-72.
Le Goues, C., Dewey-Vogt, M., Forrest, S., & Weimer, W. (2012b). A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In Software Engineering (ICSE), 2012 34th International Conference on (pp. 3-13). IEEE.
Qian, L., Winfree, E., & Bruck, J. (2011). Neural network computation with DNA strand displacement cascades. Nature, 475(7356): 368.
Schulte, E., Fry, Z. P., Fast, E., Weimer, W., & Forrest, S. (2013). Software Mutational Robustness. Journ. Genetic Programming and Evolvable Machines.
]]>At least, I know that I learned a lot of new things.
The workshop had around 34 attendees from across the world, but from the reaction on twitter it seems like many more would have been eager to attend also. Hence, both to help synchronize the memory networks of all the participants and to share with those who couldn’t attend, I want to use this series of blog post to jot down some of the topics that were discussed at the meeting.
During the conference, I was live tweeting. So if you prefer my completely raw, unedited impressions in tweet form then you can take a look at those threads for Wednesday (14 tweets), Thursday (15 tweets), and Friday (31 tweets). The workshop itself was organized around discussion, and the presentations were only seeds. Unfortunately, my live tweeting and this post are primarily limited to just the presentations. But I will follow up with some synthesis and reflection in the future.
Due to the vast amount discussed during the workshop, I will focus this post on just the first day. I’ll follow with posts on the other days later.
It is also important to note that this is the workshop through my eyes. And thus this retelling is subject to the limits of my understanding, notes, and recollection. In particular, I wasn’t able to follow the stochastic thermodynamics that dominated the afternoon of the first day. And although I do provide some retelling, I hope that I can convince one of the experts to provide a more careful blog post on the topic.
On Wednesday, Peter Stadler opened the workshop with a tutorial on the elements of computation. He outlined an extended Chomsky hierarchy (although expressed in terms of the corresponding machine models, not grammars) and discussed where various processes of molecular biology fall within this hierarchy. In other words, where in this ladder of computation should we see individual cells?
Stadler suggested that transcription was at the level of combinatorial logic. Gene regulatory networks ranged from the level of DFAs for simple ones to push-down automata for more complicated networks. And (de)modification of enzymes ad methylation of DNA — or the chromatin computer — was Turing-complete (Bryant, 2012).
In the discussion, I expressed concerns about how coarse this hierarchy was. In particular, we seem to lose most of post-70s computer science in the big gap between push-down automata and Turing machines. In general, it seemed like the overly coarse Chomsky hierarchy reappeared as a punching bad in many later discussions. And if there was any agreement during the workshop, it seemed to be that we need to move on from computability to more fine-grained computational complexity. Or maybe something even more refined and specialized for the purposes of distributed computing, in particular. Turing machines were seen as “shutting out the world”, and there was a big push to focus on models of explicitly concurrent systems instead.
After a short break, Michael Levin took the flow to discuss morphological computation: how the structuring of morphology, development, and self-repair computes within us. He focused on experimental biology, giving us examples from caterpillars liquefying their brains as they turn into butterflies to flatworms regrowing their heads after they explode from barium poisoning. Levin was especially focused on how these biological case studies highlighted the organism’s ability to self-model. Or in the language of Turing Machines: to receive themselves as their own input.
Levin gave many examples, but one of the most dramatic was the salamander. These animals are famous for regrowing their limbs, but the extreme robustness of this regrowth is seldom discussed. In particular, if you attach a salamander’s severed tail to the stump of a lost leg then over time that tail will transform into a new leg. For Levin, this was morphological self-surveillance: the process of morphological development could survey itself and remodel towards globally correct structured. And this wasn’t limited to salamanders. It also occurred in tadpoles as they reconfigured their faces to become frogs, and in flatworms.
For me, planarian flatworms were the main highlight of Levin’s talk. These are multicellular organisms that reproduce by ripping themselves in half and then having each half regrow into a full planarian. The experimental record was from the early 19th century, where a single planarian was cut up into over 200 pieces with each piece regenerating into a new functional animal. This amazing regenerative ability makes planarians immortal, but it also means that their somatic tissue has been around for thousands of years and is thus full of mutations. Their genome is a mess. We can’t even say how many chromosomes a planarian has because the number is different between different cells in the same animal. This is a huge conceptual challenge for people who are overly obsessed with genomics.
Most remarkable is the robustness and flexibility of planarian regeneration. For example, large concentrations of barium mess with most biological function in most animals. In the case of planarian’s, it makes their heads explode. Yet after about 12 days, the animal will regrow a head that is resistant to barium. Since barium is not a chemical that the organisms ever encountered in their evolutionary history, this is an example of true novelty being generated by morphogenesis. The flexibility of planarian morphogenesis can also be harnessed for engineering purposes. Levin is able to grow planarians with 2 or 3 heads (and thus 2 or 3 brains) that continue to live and flourish in their environment.
Jessica Flack closed the morning session with a discussion of the different ways that computer science concepts are used in biology. She favours the programmatic approach: borrow ing computer science concepts to work out practical bio problems. But she wants to make the computer science more foundational and precise. Especially for collectives. This urge for a more foundational role for computer science is similar to my advocacy for algorithmic biology as a foundational alternatively to the practical focus of computational biology.
A big focus for Flack’s talk was the shift of emphasis between microscopic and — what she called — mesoscopic levels of description, with coarse-graining as the connection. And in the case of biology and computation, the information flow proceeding not only from the microscopic up to the mesoscopic but also from the mesaoscopic down to the microscopic. As I’ve stressed before, I don’t think that coarse-graining is the only mapping that we should consider — especially when abstraction is available. And Stephanie Forrest echoed this sentiment in the discussion arguing that “abstraction is exactly what computation can contribute to biology.”
The central aspect of Flack’s presentation, however, wasn’t the details of the mapping but the feedback between microscopic and mesoscopic theory. Her emphasis was to not just start with the micrsocopic theory and work up to the mesoscopic but to look at the feedback between the levels. It is important to use a good mesoscopic theory to then identify the relevant microscopic states. For me, this was best enacted by DeDeo, Krakauer & Flack’s (2010) inductive game theory. I have recently also worked towards this end by defining effective games as an empirical abstraction (Kaznatcheev, 2017; 2018), and starting with careful measurements of them via the game assay for type-type interactions (Kaznatcheev et al., 2019) before trying to work down to the microscopic level of cell-cell interaction.
Flack finished with the four types of questions we might ask of biology. We might ask about capacity; we might ask about satisficing, efficiency, and optimality; we might ask about mechanism and architecture; and we might ask about evolution. Computer science can help us answer all four kinds of questions.
After lunch, we focused more formally on how to coarse grain the microscopic into the mesoscopic using stochastic thermodynamics. Massimiliano Esposito gave the opening tutorial and explained how this framework can help us connect thermodynamics and information. Whereas traditional thermodynamics applies to extremely large systems, stochastic thermodynamics is the thermodynamics of small systems. For an exhaustive review, see Seifert (2012) and for an introduction see Van den Broeck & Esposito (2015). Stochastic thermodynamic’s central concept is local detailed balance that allows one to assign energies to all the chemical species that are being modeled. This allows us to look at non-equilibrium systems and use the non-equilibrium state as a resource to be used or wasted. In particular, we can information to extract work and measurement to ‘create’ energy. This allows us to have a generalized second law of thermodynamics: information is an upper bound on work.
During the tutorial some controversy erupted on if we could think of Maxwell’s demon (or better yet, Szilard’s engine) as the conceptually minimal form of life: one bit of measurement, one bit of action, one bit of memory. Tali Tishby defended this position, but many were skeptical.
After, Jordan Horowitz build on Esposito’s groundwork by showing how there isn’t a single second law, but many second laws of thermodynamics. Depending on what information-like measure we use, we get different bounds on the amount of work that can be extracted using that information. Thermodynamic costs are context dependent. At least when we have some coarse-grained black boxes within our system.
Finally, David Wolpert connected this framework to standard aspects of algorithmic information theory such as Kolmogorov complexity. This allows for looking at the stochastic thermodynamics of computation (Wolpert, 2019). Wolpert started by applying this theory “under the lampost”, he showed how this approach can be used to analyze certain toy problems like circuits and DFAs. The hope is to expand this approach to real biological problems, but Wolpert left this connection to biology for Thursday.
Bryant, B. (2012). Chromatin computation. PloS One, 7(5): e35703.
DeDeo, S., Krakauer, D. C., & Flack, J. C. (2010). Inductive game theory and the dynamics of animal conflict. PLoS Computational Biology, 6(5): e1000782.
Kaznatcheev, A. (2017). Two conceptions of evolutionary games: reductive vs effective. bioRxiv, 231993.
Kaznatcheev, A. (2018). Effective games and the confusion over spatial structure. Proceedings of the National Academy of Sciences, 115(8), E1709-E1709.
Kaznatcheev, A., Peacock, J., Basanta, D., Marusyk, A., & Scott, J. G. (2019). Fibroblasts and alectinib switch the evolutionary games played by non-small cell lung cancer. Nature Ecology & Evolution, 3(3): 450.
Seifert, U. (2012). Stochastic thermodynamics, fluctuation theorems and molecular machines. Reports on Progress in Physics, 75(12), 126001.
Van den Broeck, C., & Esposito, M. (2015). Ensemble and trajectory thermodynamics: A brief introduction. Physica A: Statistical Mechanics and its Applications, 418, 6-16.
Wolpert, D.H. (2019). The stochastic thermodynamics of computation. Journal of Physics A: Mathematical and Theoretical, 52(19), 193001.
]]>This use of Bayes’ law has lead to a widespread association of Bayesianism with rationality, especially across the internet in places like LessWrong — Kat Soja has written a good overview of Bayesianism there. I’ve already written a number of posts about the dangers of fetishizing rationality and some approaches to addressing them; including bounded rationality, Baldwin effect, and interface theory. I some of these, I’ve touched on Bayesianism. I’ve also written about how to design Baysian agents for simulations in cognitive science and evolutionary game theory, and even connected it to quasi-magical thinking and Hofstadter’s superrationality for Kaznatcheev, Montrey & Shultz (2010; see also Masel, 2007).
But I haven’t written about Bayesianism itself.
In this post, I want to focus on some of the challenges faced by Bayesianism and the associated view of rationality. And maybe point to some approach to resolving them. This is based in part of three old questions from the Cognitive Sciences StackExhange: What are some of the drawbacks to probabilistic models of cognition?; What tasks does Bayesian decision-making model poorly?; and What are popular rationalist responses to Tversky & Shafir?
Let’s start with Tversky & Shafir. The scientists are probably best known today from their prominent roles in Kahneman’s (2011) Thinking: Fast and Slow. The three were colleagues and made many important contributions to psychology — I’d highly recommend reading Kahneman’s book for a broad overview. I’ll focus on the early 90s when Tversky & Shafir observed several violations of rationality in human participants. In particular, they noted violation of the disjunction effect and sure-thing principle (for examples, see Shafir & Tversky, 1992; Tversky & Shafir, 1992).
An example of the violation they saw was in the Prisoners’ dilemma (Shafir & Tversky, 1992): if a person knew their partner defected then also defected (only a 3% cooperation rate), if a person knew their partner cooperated then they usually still defected (only a 16% cooperation rate). However, if they were not sure if their partner defected or cooperated, then they cooperated at much higher rates (a 37% cooperation rate). This violates the naive rationalist expectation of some % between 3 and 16 in the unknown-condition case. This is called a violation of the sure-thing principle.
More formally, given a random variable that can have only two possible outcomes A and B, probability requires to be between and . A violation is when and (or both instead). In addition to Shafir & Tversky (1992) example with Prisoner’s dilema, violations of the sure thing principle have also been shown by Tversky & Shafir (1992) in a two-stage gambling task; Townsend et al. (2000) in a face-categorization task; and others.
Shafir & Tversky (1972) explained this effect through quasi-magical thinking. Even though the participants knew they had no causal effect on their partner’s choice, when the choice was unknown they still “did their part” to cause a favorable outcome.
Note that given the false-belief that your actions magically effect the outcome of the other participant’s decision, it is no longer irrational to cooperate at a higher rate when you don’t know the partner’s decision.
Tversky & Kahneman (1983) described Linda and then asked the participant to make a probability judgement of Linda being a bank-teller, or a probability judgement of Linda being a bank-teller and a feminist. In any Bayesian model (without some grafted mechanisms or weird latent variables) you need to have , but the participants judged and committed a conjunction fallacy.
Tversky & Kahneman (1983)’s ad-hoc explanation for this was the <representative heuristic. But Gavanski & Roskos-Ewoldsen (1991) recreate the fallacy in a setting where they showed that this ad-hoc heuristic is not sufficient. Alternatively, it is natural to suspect that Tversky & Kahneman (1983)'s result could be an artifact of participants not understanding the modern concept of probability. But Sides et al. (2002) account for this by using a betting paradigm that uses probabilities implicitly instead of asking participants to report numeric values. They showed that the conjunction fallacy is independent of numeric probability reporting and thus an intrinsic 'error'.
The above back and forth of ad-hoc fixes via heristics/biases is a particularly important point. It shows how unconstrained Bayesian models can run the risk of being just-so stories in the sense of Bowers & Davies (2012).
For a final highlight: when conducting a questionnaire, the order questions are asked in changes the resulting probability judgements. For a purely Bayesian approach, the mutual probability of asking A then B and getting a specific outcome is . This would suggest that the order questions are asked shouldn't matter. But Feldman & Lynch (1988), Schuman & Presser (1996), Moore (2002), and countless pollsters have shows that order of questions matters. Hence, we have a failure of commutativity.
This order effect is not confined to questions. It also applies to integrating evidence. The strongest point of Bayesianism is a clear theory of how to update hypotheses, given evidence — i.e. Bayes’ rule. Unfortunately for Bayes’ rule but Shanteau (1970) and Hogarth & Einhorn (1992) showed that for humans this is not always the case. Unsurprisingly, they present ad-hoc heuristic-and-biases alternatives to explain this.
More importantly, these order effects are not confined to the artificial setting of the university lab. They can also be found in the wild. Non-commutativity has been seen in natural settings including clinical diagnoses (Berges et al., 1998) and dispute mediation by a jury (McKenzie, Lee, & Chen, 2002).
If all of these standard violations of probability aren’t enough, even more exotic violations are possible. For example, Aerts & Sozzo (2011) studied membership judgements for pairs of concept combinations. They found that among their participants there were dependences between concept pairs that violated Bell’s theorem. Thus, this data could not be fit by any reasonable classical joint distribution over the concept combinations.
Of course, this list is not exhaustive or comprehensive. I also don’t know how robust each individual study is. But seeing these many anecdotes does make me wonder why people are so inclined to take Bayesian models as the obvious choice. In the end, it is probably just that all models are wrong but some are useful. We can use Bayesian models as a starting point and build from there. We just have to make sure that our building doesn’t make arbitrary just-so stories.
Aerts, D. & Sozzo, S. (2011) Quantum Structure in Cognition: Why and How Concepts Are Entangled. Quantum Interaction 7052: 116-127.
Berges, G.R., Chapman, G.B., Levy, B.T., Ely, J.W., & Oppliger, R.A. (1998) Clinical diagnosis and order information. Medical Decision Making 18: 412-417.
Bowers, J.S. & Davis, C.J. (2012). Bayesian just-so stories in psychology and neuroscience.. Psychological Bulletin, 138: 389.
Chater, N., Tenenbaum, J. B., & Yuille, A. (2006). Probabilistic models of cognition: Conceptual foundations. Trends in Cognitive Science, 10(7): 287-291.
Feldman, J.M. & Lynch, J.G. (1988) Self-generated validity and other effects of measurement on belief, attitude, intention, and behavior. Journal of Applied Psychology 73: 421-435.
Gavanski, I., & Roskos-Ewoldsen, D.R. (1991) Representativeness and conjoint probability. Journal of Personality and Social Psychology 61(2): 181-194.
Griffiths, T. L., & Yuille, A. (2008). A primer on probabilistic inference. In M. Oaksford and N. Chater (Eds.). The probabilistic mind: Prospects for rational models of cognition. Oxford: Oxford University Press.
Hogarth, R.M. & Einhorn, H.J. (1992) Order effects in belief updating: the belief-adjustment model. Cognitive Psychology 24: 1-55.
Kahneman, D. (2011). Thinking: fast and slow. Macmillan.
Kaznatcheev, A., Montrey, M., & Shultz, T.R. (2014). Evolving useful delusions: Subjectively rational selfishness leads to objectively irrational cooperation. Proceedings of the 36th annual conference of the cognitive science society. arXiv: 1405.0041v1.
Masel, J. (2007). A Bayesian model of quasi-magical thinking can explain observed cooperation in the public good game. Journal of Economic Behavior & Organization, 64(2): 216-231.
McKenzie, C.R.M., Lee, S.M., & Chen, K.K. (2002) When negative evidence increases confidence: change in belief after hearing two sides of a dispute. Journal of Behavioral Decision Making 15: 1-18.
Moore, D.W. (2002) Measuring new types of question-order effects. Public Opinion Quarterly 66: 80-91
Perfors, A., Tenenbaum, J.B., Griffiths, T. L., & Xu, F. (2011). A tutorial introduction to Bayesian models of cognitive development. Cognition, 120, 302-321.
Schuman, H., & Presser, S. (1996). Questions and answers in attitude surveys: Experiments on question form, wording, and context. Sage.
Shafir, E., & Tversky, A. (1992). Thinking through uncertainty: Nonconsequential reasoning and choice. Cognitive Psychology, 24: 449-474.
Shanteau, J.C. (1970) An additivity model for sequential decision making. Journal of Experimental Psychology. 85: 181-191.
Sides, A., Osherson, D., Bonini, N., and Viale, R. (2002) On the reality of the conjunction fallacy. Memory & Cognition 30(2): 191-198.
Townsend, J.T., Silva, K.M., Spencer-Smith, J., & Wenger, M. (2000) Exploring the relations between categorization and decision making with regard to realistic face stimuli. Pragmatics and Cognition 8: 83-105.
Tversky, A., & Kahneman, D. (1983). Extensional versus intuitive reasoning: the conjuctive fallacy in probability judgement. Psychology Review 101: 547-567.
Tversky, A., & Shafir, E. (1992). The disjunction effect in choice under uncertainty. Psychological Science, 3:, 305-309.
]]>In this post, I want to play with Quine’s web of belief metaphor in the context of science. This will force us to restrict it to specific domains instead of the grand theory that Quine intended. From this, I can then adapt the metaphor from belief in science to c-liefs in mathematics. This will let me discuss how complexity class seperation conjectures are structured in theoretical computer science and why this is fundamentally different from model assumptions in natural science.
So let’s start with a return to the relevant philosophy.
For Quine, the web of belief metaphor was part of a grand, sweeping theory. He was arguing against the analytic vs synthetic distinction globally. So his web contained all kinds of beliefs: personal beliefs; scientific hypothesis, beliefs, facts, and theories, mathematical conjectures and theorems; and even the rules of logic. In principle, any belief could be refined, but the latter ones — like math and logic — are often just too central to our web of belief to give up or refine. We would almost always give up the more peripheral beliefs over something like our central beliefs in the rules of logic. This grandiose view of the web of belief opens up Quine to sometimes pedantic seeming — but well placed and helpful — objections like Daniel Tippens’ argument for the incoherence of giving up the principle of non-contradiction.
But we don’t have to be as grandiose as Quine in our use of the web of belief metaphor. Instead of using the web of belief to demolish the analytic vs synthetic distinction, we could use it locally as a metaphor for how ‘facts’ hang together in some concrete (sub-)field of science. We can focus on some limited domain of knowledge. For example, we might talk about the web of belief in oncology — an interesting question: how does the web of belief of mathematical oncologists differ from the web of belief of clinical oncologists? — or evolutionary biology. Or we might restrict to even smaller settings — like the proceedings of a particular criminal trial. It would be a silly case if the prosecuting attorney launched an attack on the law of noncontradiction during an obstruction of justice case; or if a defence lawyer tried to challenge our belief in the identity of indistinguishables in an anti-trust case. Instead, in such settings we would focus on a limited local web of belief that has a certain number of auxiliary beliefs that are treated as unquestionable for the role of that particular discourse.
I want to think only about local webs of belief for two reasons. First, I find it more useful to how I make sense of science and math in my day-to-day work. Second, unlike Quine, I do want to use a rough practical distinction between types of beliefs and knowledge even if in some grand Quinian metaphysical theory they would just lay as far apart points on some continuum of web-centrality. In particular, in my day-to-day work, scientific beliefs and mathematical beliefs are fundamentally different. A theorem has a certainty of a different kind than a scientific fact. And even a conjecture feels different from a hypothesis.
I think that these limited local webs of belief can be useful for thinking about the role of conjectures in mathematics. Especially the role of complexity class separation conjectures in theoretical computer science. Let me call this special web as the Web of C-lief.
Theoretical computer science is an interesting position among branches of mathematics in that much of its bedrock rests on a series of interconnected conjectures — a web of c-lief. Most famous among these conjectures is the P vs NP problems or the conjecture that P != NP. But there are many similar complexity class separation conjectures like L or NL vs P, P vs PSPACE, or the polynomial hierarchy. More specialized complexity class seperations like the FP != PLS conecture (on which I build much of my work in algorithmic biology), or the related FP != PPAD conjecture (that powers algorithmic game theory). Plus there are much more speculative conjectures like the exponential time hypothesis or very endogenous and divisive ones like the unique games conjecture for hardness of approximation. Although for most of the core conjectures, people tend to share similar beliefs on the truth status, tending to lean in favour of conjecturing complexity class separations rather than conjecturing complexity class equivalences.
There is a rich web of interdependence and implication between these conjectures. This includes both implications like P != NP implies P != PSPACE, where both the condition and consequence are expected to be true. And hypothetical implications like L = P implies (by the space hierarchy theorem) that P != PSPACE, where most people don’t expect the condition to be true but do believe the consequence (i.e. most people that L != P != PSPACE).
All of these various implications are very clear and precise, much more so than the links in the scientific web of belief.
The links in the theoretical computer science web of c-lief have the certainty of theorems, but the nodes have only the certainty of scientific ‘facts’.
Thus, the web of c-lief provides a very formal and precise description of what we don’t know. More importantly, it also often includes information about why we don’t know.
For many of the most central conjectures in the web of c-lief, we not only know how they relate to many other results but we also know why our current proof techniques cannot possibly prove them. In other words, we have theorems that establish certain barriers to proofs. We know that a proposed proof of some of the central conjectures cannot have certain structural features that are shared by most proofs that we’ve found so far for other theorems in theoretical computer science. Concretely, we know that a proposed proof of P != NP (and some other closely related conjectures) must overcome relativization, natural proofs and algebrization barriers (or for a proof of P = NP, there are other barriers).
In other words, in the cstheory web of c-lief, unlike the scientific web of belief, we know very precisely what we don’t know, how it relates to other things that we don’t know, and in many cases even why we don’t know it. It is this explicitness, precision, and concreteness — rather than just the ‘c’ at the start of ‘conjecture’ — that prompts me to call these objects c-liefs.
In 2008, Tamar Gendler introduced the concept of an alief for a mental state corresponding to an automatic or habitual belief-like attitude. She is especially interested in aliefs that are in tension with a our explicit beliefs. For example, a person standing on a transparent balcony may believe that they are safe, but alieve that they are in danger. In other words, aliefs are more implicit and less transparent and less communicable than beliefs.
I want to follow the alphabet further — up to c-liefs. As I tried to sketch above, scientific beliefs are less explicit and less clearly connected and articulated than the c-liefs of mathematical conjectures. More importantly, even the best scientists don’t always keep good track of the conditional nature of their beliefs and forget to always be mindful of the ‘this is conditional knowledge’ flag attached to the beliefs. A competent mathematician, however, even if she is making a very long and complicated deduction from a conjecture or set of conjectures, is always aware that these statements are conditional on the as-of-yet unknown truth-value of those conjectures. She never forges the ‘this is conditional knowledge’ flag in the way that scientists sometimes do.
Thus, whereas a-liefs are less transparent and less clear or precise than beliefs, c-liefs are more transparent, more clear, and more precisely stated than beliefs. Whereas we might not know that we hold certain a-liefs and act on them as if they are facts without realizing their conditional nature; when we hold certain c-liefs, we are always explicit about remembering the ‘this is conditional knowledge’ flag and propagating that uncertainty through our deductions.
With our web of c-lief metaphor sketched, let us turn to the way theoretical computer scientists use this web. Since local webs are meant as a practical way of making sense of beliefs, rather than as a grand unified theory, this use question is central.
Already in my description of cstheory c-liefs, as I jumped between the language of problem, conjecture, and foundation, it is clear that complexity class separation conjectures play several roles in cstheory. In one way, as with much of mathematics, these conjectures are problems or precisely stated puzzles that computer scientists work to resolve. People want to prove that P != NP. They want to resolve central conjectures. There might even be some serious researchers working directly on the question. Although in most cases, due to the numerous barriers, people tend to go after more approachable conjectures in the periphery of web of c-lief with the hope that a few resolutions started there might eventually propagate to the centre of the web. This is similar to how scientists tend to design experiments to go after peripheral hypotheses in the scientific web of belief to slowly grow it.
But not all cstheorists work towards resolving the conjectures in the web of c-lief. Instead (or in addition), many cstheorists use the web of c-lief as bedrock for much of their work. For example, if you prove that some obscure problem you are trying to find a good algorithm for is NP-complete then you will stop searching for an efficient algorithm. Instead, you will try to restrict the problem to tractable subclass of instances or switch focus to approximation algorithms. When working in this mindset, you are not trying to resolve the P vs NP problem by establishing your new problem as NP-compete. Rather, you are using the web of c-lief to guide you towards new approaches to finding an algorithm for a now refined version of your problem. In this case, you are acting as if P != NP (or at least as if you won’t be the one to resolve P vs NP) to simplify your life and focus your work.
To make this very concrete and personal: when I prove that the NK-model with K > 1 is PLS-complete, I am not hoping to use this to resolve the FP vs PLS conjecture. Instead, I use our belief in FP != PLS to then make the scientific claim that computational complexity can be an ultimate constraint on evolution. The c-lief is a bedrock on which I build further scientific beliefs.
In this way, the complexity class separation conjectures seem to work like axioms or model assumptions. In fact, I was prompted to write this post by Karen Kommerce’s tweet comparing complexity class separation conjectures to (empirically unjustified) assumptions in economic models. I don’t think that this is a reasonable comparison. Hopefully this post has explained why. For me model assumptions and conjectures are different in much the same way as scientific facts and mathematical theorems.
The most obvious dissimilarity is in the levels of certainty. This is not general to all conjectures, but applies only to central c-liefs in theoretical computer science. As the old joke goes, if computer scientists used the same standards of proof as physicists then the P vs NP question would have been solved in the 70s. Experimentally, the question of non-existence of an efficient algorithm for an NP-complete problem has been tested much better than most physical ‘facts’. In this way, these central conjectures have a much more solid grounding than most model assumptions.
Second, and more general, is that model assumptions are seldom known precisely. They are often very implicit. Sometimes implicit to the point that we might not endorse them if they were made explicit. And a lot of good scientific and philosophical work can be done in uncovering model assumptions. In this way, they are the polar opposite of c-liefs. A conjecture is only a conjecture when it is stated precisely.
But even when we do state our model assumptions relatively precisely (i.e. as some formal modellers do), we are not making these statements to highlight our ignorance. Instead, we are stating the model assumption to get the conversation going. The conversation might be productive and enlightening even if the model assumptions are wrong. A conjecture highlights our ignorance and does not let us forget the conditional nature of all the results based on it. The web of c-lief also interconnects the dependencies between conjectures much more explicitly than any set of model assumptions than I am familiar with.
Finally, the context of use for cstheory conjectures versus most model assumptions is very different. In the case of economics and evolutionary biology, most models are heuristics. As such, the truth value of the assumption matters much less than the usefulness value of the overall model. This is part of the reason why economists are not particularly taken aback by the ‘but humans aren’t rational’ critique. The economist knowns that her assumptions are wrong, but that in itself doesn’t make her model not useful. To convince the economist, I would have to not only challenge her model’s assumption but then also provide an alternative better model. In the case of cstheory conjectures, however, when they are used in modelling it is more often for abstractions.
]]>Just as in my ESEB talk, I’ll use triangles to explain the distinction between idealized vs. abstract models.
So let us imagine evolutionary biology as a bunch of triangles. We can think of each of these triangles as representing a different biological process that implements evolution. In particular, let us think of each triangle as a different population with its own structure, demography, standing genetic variation, etc. and thus its own corresponding evolutionary dynamic. In the top right corner, the green triangle might correspond to a bee colony with a very specific and strange sex ratio. Oner on the bottom right, the orange triangle might be a biofilm of slime mold with their complicated spatial structure. Maybe the blue triangle is a population of antelope undergoing range expansion. We could go on and on. For every empirical population studied at ESEB, we might imagine a corresponding triangle.
The point is that they each have some very particular details. These can be very different for each evolutionary dynamic. So how do we usually deal with this complexity?
We deal with this complexity by idealizing.
We pick a particularly simple or convenient evolutionary dynamic. One that we think is general. Something like using an equilateral triangle as a stand in for the mess of real triangles. We will often argue that this particularly simple model gets at the ‘essence’ of all evolutionary dynamics. But in reality, our choice is often guided by our methods. We pick the equilateral triangle — or the strong-selection weak-mutation dynamics — because we have the mathematical skills necessary to analyze it. And from then on we suppose that evolutionary dynamics is an equilateral triangle and analyze it as such.
If we end up talking with more experimentally oriented colleagues, we might say: “oh yeah, this is kind of like the green bee colony”. But our colleague might study slime molds and we would have to admit that it is not so much like the orange slime mold triangle. At that point, the resourceful modeler might offer to deform their idealized triangle to get one that looks more like the slime molds. So we end up endlessly modifying our idealized models with various features that we want to add or take consideration of. Of course, in practice this is made extra difficult but our lack of knowledge about what kinds of triangles occur in nature.
I think that this idealization approach is the standard approach in theoretical and mathematical biology.
But it isn’t the only approach that we can take.
Instead of making an idealization, we can follow the route of abstraction. We can just note that all the shapes we drew are triangles. And so let us see what we can conclude from properties that all triangles have in common.
Unfortunately, abstraction comes with some downsides. First, it means that we cannot get certain specific results. We can say much more about a specific equilateral triangle than we can about an arbitrary triangle. Second, we lose some things. An equilateral triangle is a concrete triangle, it ‘looks’ like a triangle. An equilateral triangle ‘resembles’ the triangles it is modeling. The concept of triangle, however, is not a concrete triangle. It doesn’t ‘look’ like anything. It doesn’t ‘resemble’ the system it models. Rather, it specifies a language in which that system can be expressed. Thus, the abstraction can be of a different type than the things it abstracts over. And we need different tools for dealing with this.
This is where the tools of theoretical computer science come in.
How do we reason about arbitrary triangles? Or in our case: arbitrary evolutionary dynamics with arbitrary population structures, etc. For this, I use theoretical abstraction.
As I’ve discussed several times before on TheEGG, all these various evolutionary dynamics are still algorithms. Thus, they are subjects to the laws of computational complexity. We can use these laws to establish general results like the difficulty of reaching local fitness peaks. See Kaznatcheev (2019) for more.
But abstraction can also help with experiment, not just theory. Or as I’ve written before: abstract is not the opposite of empirical.
In the language of triangles, we might care about some specific property of triangles like their area. Normally, we would find this area by measuring all three sides or measuring two sides and the angle between them. Then from these reductive measurements we would compute the effective area. In the context of evolutionary dynamics, especially in the context of evolutionary game theory — this might correspond to knowing the pairwise interaction between strategies and then running that interaction over some spatial structure to get some surprising prediction about which strategy comes to dominate the population following this particular spatially structured evolutionary dynamic. As I discuss in Kaznatcheev (2018), this direction from reductive to effective has been the standard approach in much of evolutionary game theory.
But do we need to always measure these reductive details that identify a particular triangle? After all, many triangles have the same area. And if we only care about the area then we don’t need to know which particular combination of side-lengths resulted in our area. Especially if we can come up with a clever way to measure area directly without measuring side lengths.
I don’t know how to do this for the area of triangles, but I do know how to do this for effective games (Kaznatcheev 2017; Kaznatcheev et al., 2019). In the context of evolutionary games, the clearest example of multiple-realizability is due to spatial structure. I highlight this in the figure below.
Consider some effective, population level game. Say the Leader game we measured in non-small cell lung cancer: .
$latex G_\text{eff}$ is like the area of a triangle. It can be implemented by many different side measurements, which in this case correspond to different reductive games.
For example, with an idealized inviscid population structure, is implemented by a reductive game that has the same numeric values; i.e. . But in an idealized spatialized population like a death-birth 3-regular random graph, it is instead implemented by a qualtitatively different Hawk-Dove game; i.e. .
But most important is the case where we have nature implementing our game. It is some experimental triangle that has some specific side-lengths that give us our area. The experimental process of the game assay calculates this ‘area’ — i.e. effective game. But we don’t need to know the details of this reductive game and the transformation if all we care about is the effective game. In other words, we can be happy with .
This can be useful if we care about something like the outcome for a patient — a global effective property — but don’t know the details of the interactions going on within the tumour — the local reductive game. In this case, we want a process like the game assay to measure the global effective game without first having to learn the reductive game and the details of the population structure that transforms it.
Hopefully thinking about abstraction and multiple realizability can help us go after these kind of goals more clearly.
Kaznatcheev, A. (2017). Two conceptions of evolutionary games: reductive vs effective. bioRxiv, 231993.
Kaznatcheev, A. (2018). Effective games and the confusion over spatial structure. Proceedings of the National Academy of Sciences, 115(8): E1709-E1709.
Kaznatcheev, A. (2019). Computational complexity as an ultimate constraint on evolution. Genetics, 212(1): 245-265
Kaznatcheev, A., Peacock, J., Basanta, D., Marusyk, A., & Scott, J. G. (2019). Fibroblasts and alectinib switch the evolutionary games played by non-small cell lung cancer. Nature Ecology & Evolution, 3(3): 450.
]]>The replication crisis is most often associated with psychology — a field that seems to be having the most active and self-reflective engagement with the replication crisis — but also extends to fields like general medicine (Ioannidis, 2005a,b; 2016), oncology (Begley & Ellis, 2012), marketing (Hunter, 2001), economics (Camerer et al., 2016), and even hydrology (Stagge et al., 2019).
When I last wrote about the replication crisis back in 2013, I asked what science can learn from the humanities: specifically, what we can learn from memorable characters and fanfiction. From this perspective, a lack of replication was not the disease but the symptom of the deeper malady of poor theoretical foundations. When theories, models, and experiments are individual isolated silos, there is no inherent drive to replicate because the knowledge is not directly cumulative. Instead of forcing replication, we should aim to unify theories, make them more precise and cumulative and thus create a setting where there is an inherent drive to replicate.
More importantly, in a field with well-developed theory and large deductive components, a study can advance the field even if its observed outcome turns out to be incorrect. With a cumulative theory, it is more likely that we will develop new techniques or motivate new challenges or extensions to theory independent of the details of the empirical results. In a field where theory and experiment go hand-in-hand, a single paper can advance both our empirical grounding and our theoretical techniques.
I am certainly not the only one to suggest that a lack of unifying, common, and cumulative theory as the cause for the replication crisis. But how do we act on this?
Can we just start mathematical modelling? In the case of the replicator crisis in cancer research, will mathematical oncology help?
Not necessarily. But I’ll come back to this at the end. First, a story.
Let us look at a case study: algorithmic trading in quantitative finance. This is a field that is heavy in math and light on controlled experiments. In some ways, its methodology is the opposite of the dominant methodology of psychology or cancer research. It is all about doing math and writing code to predict the markets.
Yesterday on /r/algotrading, /u/chiefkul reported on his effort to reproduce 130+ papers about “predicting the stock market”. He coded them from scratch and found that “every single paper was either p-hacked, overfit [or] subsample[d] …OR… had a smidge of Alpha [that disappears with transaction costs]”.
There’s a replication crisis for you. Even the most pessimistic readings of the literature in psychology or medicine produce significantly higher levels of successful replication. So let’s dig in a bit.
How would finance make sense of this failer to replicate?
The first defence in finance is an ontological one: the market cares about research on the market. Specifically, if you find a winning strategy then others will copy you and the strategy will stop winning. In quantitative finance, this is given a special name: alpha-decay.
/u/chiefkul reports that “[e]very author that’s been publicly challenged about the results of their paper says it’s stopped working due to “Alpha decay” because they made their methodology public.”
In other words, the authors claim that the disappearance of their results is not regression to the mean due to p-hacking for a big alpha. Instead, it is an actual change in the market due to others using the strategy or feature they discovered and thus making their strategy obsolete.
This sort of defence is occasionally used in psychology as well. Some past result is so widely reported that the deception (or some other feature) that underlies the experiment is no longer possible and thus a real result disappeared.
On the surface, regression to the mean and alpha decay can seem difficult to tell apart. But regression-to-the-mean has an important time symmetry that alpha-decay doesn’t: we expect to see a regression to the mean if we extend our dataset either forward or backward in time. Alpha decay, on the other hand, should only appear after publication and so if we extend our historic data back in time to before what the paper trained on, we should still expect to see big alpha.
This is exactly what /u/chiefkul reports doing. They report: “For the papers that I could reproduce, all of them failed regardless of whether you go back or forwards [in time]”.
So the ontological defence seems suspect.
The second defence in finance is a sociological one: only bad strategies are published, if you had a good strategy then you would simply use it to make money instead of sharing it with the public. Or alternatively: academia consistently underperforms the ‘real world’. Any actually good strategy doesn’t become a paper, it becomes a hedge fund.
This is a convenient defence. The finance equivalent of security from obscurity.
This sociological defence certainly sounds plausible until we look at the real-world performance of actively managed funds. If we look back from 2019 at the performance on large-cap funds versus the S&P 500 then over the prior year, nearly 2/3rd of actively managed funds underperformed the S&P 500. Over the last 10 years, more than 8 in 10 underperformed. And over the last 15 years, around 9 in 10 actively managed large-cap funds underperformed the S&P 500.
So even if the best strategies are being turned into hedge funds, it certainly doesn’t seem that they provide consistent returns. This makes the sociological defence seems suspect.
At this point, we might start to to reflect on finance’s strained relationship with cross-validation. Or re-read Bailey et al. (2014) Pseudo-Mathematics and Financial Charlatanism. It is tempting to start to nod along with /u/chiefkul as they confirm our priors about the failures of finance.
But should we trust /u/chiefkul?
This is an important question in science as well: what does a failed replication tell us? How do we know that the replication was done well? With the excitement for failed re-plications, how can we be confident that the replicator itself doesn’t suffer from sampling bias or p-hacking?
A lot of this is addressed by doing careful open science and by pre-registration. By sharing our datasets and methods.
In the case of the claimed failed replication of 130+ papers, we have none of this. Other redditors, like /u/programmerChilli on /r/slatestarcodex or /u/spogett on /r/algotrading are skeptical of this replication effort. Given that most papers are poorly written most papers, most methods are underdescribed, and datasets are hard to get, it is a challenge for most people to reproduce even a single paper. Claiming to reproduce 130+ in 7 months is a stretch. Especially if when asked for their code, /u/chiefkul responds with comments like:
Honestly none of it is particularly good and it’s just a complete mess now. As I was trying to explain I kinda lumped papers together so I’d build a script for TA+ML for example and then I’d just keep editing it for every paper that was in that category.
As /u/spogett writes on the original post: “this post does not pass the smell test”.
But it is certainly attention-grabbing, both on Reddit and on twitter.
This should remind us of a final point to keep in mind when thinking about the replication crisis. If we are sceptical of how first-order studies might or might not replicate, we should also be sceptical of second-order replication studies. If we want to hold first-order studies to higher standards then we should hold the second-order to even higher standards. To lead by example.
I am not left in an awkward position. If /u/chiefkul’s post is a publicity stunt or just trolling then how can we learn anything from it? What was the point of reading /u/chiefkul’s post or worse yet, my post about their post? If in the end, we can be no more or less confident that there is or there isn’t a replication crisis in algorithmic trading then what can we possibly learn for how we approach the replication crises in science?
Did you just waste 10 minutes of your time by reading this post? Did I waste several hours by writing it?
I hope not.
As a tried to argue last week in the context of mathematics: it isn’t just the outcome state (theorem) that matters but the process (ideas and proof technique).
Even if /u/chiefkul’s report is a publicity stunt, by analyzing it and thinking about it, we can learn things about the replication crisis. What we learn in the process is valuable even if the finding of ‘almost all results in algorithmic trading don’t replicate’ turns out to be false. Of course, in the case of a publicity stunt, we learn less than we would have from genuine results. But either way, we don’t completely waste our time and efforts.
So what does this mean for introducing more mathematical modelling to fields like psychology or cancer research?
We need to be wary of mathematical models that only take accepted or expected empirical foundations and turn them into surprising results. Instead, we need to do work that can be useful even if its empirical grounding is shaky. We need to write papers that combine both a surprising conclusion based on the facts we currently believe and also extend of methodology or develop new techniques that can be useful even if those ground ‘facts’ turn out to be false.
We need to not only use mathematics and statistics to transform historical data into new predictions. But also develop new mathematics and statistics that can still be worth studying if that historic data turns out to be bad or the predictions aren’t realized. And we need to share these techniques in an open and accessible way.
Bailey, D., Borwein, J., de Prado, M.L., & Zhu, Q. (2014). Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance. Notices of the American Mathematical Society, 61(5).
Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483(7391): 531.
Camerer, C. F., Dreber, A., Forsell, E., Ho, T. H., Huber, J., Johannesson, M., … & Heikensten, E. (2016). Evaluating replicability of laboratory experiments in economics. Science, 351(6280): 1433-1436.
Hunter, J. E. (2001). The desperate need for replications. Journal of Consumer Research, 28(1): 149-158.
Ioannidis, J.P. (2005a). Contradicted and initially stronger effects in highly cited clinical research. JAMA, 294(2): 218-228.
Ioannidis, J. P. (2005b). Why most published research findings are false. PLoS Medicine, 2(8): e124.
Ioannidis, J. P. (2016). Why most clinical research is not useful. PLoS Medicine, 13(6): e1002049.
Stagge, J. H., Rosenberg, D. E., Abdallah, A. M., Akbar, H., Attallah, N. A., & James, R. (2019). Assessing data availability and research reproducibility in hydrology and water resources. Scientific Data, 6: 190030.
]]>On the way to the blackberry trails, we passed a perfectly fine Waitrose — a supermarket that sells (among countless other things) jam. A supermarket I had to go to later anyways to get jamming sugar. Why didn’t we just buy the blackberries or the jam itself? It wasn’t a matter of money: several hours of our time picking berries and cooking them cost much more than a half-litre of jam, even from Waitrose.
I think that we spent time picking the berries and making the jam for the same reason that mathematicians prove theorems.
Imagine that you had a machine where you put in a statement and it replied with perfect accuracy if that statement was true or false (or maybe ill-posed). Would mathematicians welcome such a machine? It seems that Hilbert and the other formalists at the start of the 20th century certainly did. They wanted a process that could resolve any mathematical statement.
Such a hypothetical machine would be a Waitrose for theorems.
But is math just about establishing the truth of mathematical statements? More importantly, is the math that is written for other mathematicians just about establishing the truth of mathematical statements?
I don’t think so.
Math is about ideas. About techniques for thinking and proving things. Not just about the outcome of those techniques.
This is true of much of science and philosophy, as well. So although I will focus this post on the importance of process over state/outcome in pure math, I think it can also be read from the perspective of process over state in science or philosophy more broadly.
It is tempting to switch from blackberries to fish and summarize the focus on process over outcome with the proverb attributed to Anne Isabella Thackeray Ritchie: “Give a man a fish and you feed him for a day; teach a man to fish and you feed him for life.”
So in the context of mathematics, it is tempting to adapt the above as: “Give a mathematician a theorem, you satisfy her for a day; teach a mathematician a new proof technique and you satisfy her for life”.
But both of these are too outcome-focused for my taste. Too grounded in fish. They both put the outcome state as primary and only value the process in that it achieves the state. I don’t think that this is the case in pure mathematics.
Let’s return to the opening machine that replied with perfect accuracy if a given statement was true or false. In Lady Ritchie’s metaphor, this would be like the supermarket or sushi bar around the corner: you can easily get food anytime without having to know how to fish.
Yet sometimes people with easy access to supermarkets or sushi bars still like to go fishing. They don’t go fishing because they need to eat some fish, they go fishing because they like to go fishing.
I think this is also the case for mathematicians.
In many cases, mathematicians prove theorems not because they need to know the truth-value of those statements but because they like proving theorems. But this like isn’t a purely meditative one, they don’t want to reprove the same theorem over and over again. They want to find new ways to prove new kinds of theorems.
Mathematicians are after tricks, not theorems.
Recently, an interesting discussion was started on /r/math by /u/Junglemath on horrible words that should never be used anywhere in math. In particular, they were concerned about words like ‘obviously’, ‘clearly’, ‘trivially’, ‘easy to see’, ‘doesn’t warrant a proof’, and ‘left as an exercise to the reader’. If our goal is to establish the truth of mathematical statements then these words have no place in mathematics. They are just invitations to make mistakes. The thinking goes: if something is obvious then just write down the proof.
But our primary goal isn’t to establish the truth of mathematical statements.
If you are writing a paper for other mathematicians then it is important to establish what new trick you develop. And there is no reason to bore the reader with applications of existing and well-known tricks. Good taste and insight in mathematics come from recognizing when some approach or technique is novel or likely to be widely relevant.
This brings us back to the Reddit discussion on horrible words.
In this context, words like trivially, clearly, obviously, and the omission of proofs is justifiable. It helps us focus our attention on the heart of an idea. On the new gem that powers the proof. As such, seeing those words in textbooks or lectures is the author or instructor training us on good taste. It is why it is okay for an instructor to skip steps in proof with ‘obviously’ but not as good for the student to do so in their writing. Although I still try to teach this to my students. I encourage them to write a proof sketch drawing my attention to the ‘central’ parts of the proof before they embark on giving a fully fleshed out proof. This gives them an opportunity to identify which part of the proof is the central idea and which parts are routine applications of straightforward techniques.
Within the myth of genius of mathematics, there is the expectation that great mathematicians simply see the truth or falsehood of mathematical statements. That things are trivial for them that are not trivial for us. As such, it might seem like it would be impossible to follow somebody like Terry Tao when he writes that ‘this part is trivial’. But I think this is a mistaken view of math.
One of the things that makes Terry Tao a great mathematician — alongside developing new ideas — is the ability to identify which ideas are old and which ideas are new. A good mathematician can identify which parts of their proof are non-standard and new (thus, not trivial) and which parts are rehashing of standard (sometimes very complex) techniques. Hence, the reason it is useful to know that Tao finds trivial vs non-trivial is not that it highlights his ability to see the truth of statements like some sort of oracle of Los Angeles. What matters is that Tao can point us to which techniques are new (and thus we should learn and incorporate into our own mathematical tool-set) and which techniques are applications of things we already know how to do. This is very useful information.
Thus, when used correctly, ‘trivially’, ‘clearly’, and ‘obviously’ are great signposts. As long as you are using them to highlight new ideas and processes instead of to establish the truth of statements.
As for the machine that establishes the truth or falsehood of any statement. Apart from being impossible (thank you, Halting Problem), it would be rather unfortunate. We understand why something is true not through the truth value of a statement but through the building and thinking about proofs. For me, this hypothetical machine would fall under machine learning without understanding. Unless, of course, it produced not just truth statements but human-readable proofs that can pass Ganesalingam & Gowers’ Mathematical Turing test. But in that case, we might as well call that machine another mathematician and welcome it into our community with open arms.
I think that similar sentiments can also be found in science and philosophy more broadly. I especially like how this sentiment is turned in on itself in theoretical computer science and parts of math to find the limits and barriers to our best proof techniques.
It is certainly fun to have useful reliable knowledge come out as the outcome state of our process of inquiry. Just as it might be great to eat that bass the fishers caught on their fishing trip, or fun for Maylin and me to enjoy our jam. And there might also be practical disciplines that only focus on producing useful knowledge — much like a fishing boat or a commercial farm feeds a supermarket. But I suspect that for most scientists and philosophers, just as with mathematicians or recreational fishers, it is the process and not the outcome that matters most. Reliable knowledge as an outcome state is just a wonderful bonus.
]]>[T]here turn out to be nine and sixty ways of constructing power laws, and every single one of them is right, in that it does indeed produce a power law. Power laws turn out to result from a kind of central limit theorem for multiplicative growth processes, an observation which apparently dates back to Herbert Simon, and which has been rediscovered by a number of physicists (for instance, Sornette). Reed and Hughes have established an even more deflating explanation (see below). Now, just because these simple mechanisms exist, doesn’t mean they explain any particular case, but it does mean that you can’t legitimately argue “My favorite mechanism produces a power law; there is a power law here; it is very unlikely there would be a power law if my mechanism were not at work; therefore, it is reasonable to believe my mechanism is at work here.” (Deborah Mayo would say that finding a power law does not constitute a severe test of your hypothesis.) You need to do “differential diagnosis”, by identifying other, non-power-law consequences of your mechanism, which other possible explanations don’t share. This, we hardly ever do.
The curse of this multiple-realizability comes up especially when power-laws intersect with the other great field of complexology: networks.
I used to be very interested in this intersection. I was especially excited about evolutionary games on networks. But I was worried about some of the arbitrary seeming approaches in the literature to generating random power-law graphs. So before starting any projects with them, I took a look into my options. Unfortunately, I didn’t go further with the exploration.
Recently, Raoul Wadhwa has gone much more in-depth in his thinking about graphs and networks. So I thought I’d share some of my old notes on generating random power-law graphs in the hope that they might be useful to Raoul. These notes are half-baked and outdated, but maybe still fun.
Hopefully, you will find them entertaining, too, dear reader.
What is a good definition of a random power-law graph? This is a difficult question because it has many words that seem like they are precisely defined but aren’t. Let’s start with ‘random’. This is extremely vague, but in practice, it is often used to mean something like ‘unbiased’ or ‘unstructured’ or ‘unparticular’. Of course, it can never really mean that since randomness is a great way to generate structure. But the best we can do to work towards ‘unbiased’ is to take a look at what we mean with existing more cleared defined objects like random k-regular graphs.
So let’s look at how random is defined here.
First, what is a k-regular graph? It a graph on n vertices where each vertex has exactly k neighbours. These don’t always exist, so for simplicity let us suppose that and that kn is even. If we wanted to look at the degree distribution of such a graph then it would be zero everywhere except for a mass of 1 at k.
Now, what does it mean to be a random k-regular graph? Given any particular graph, one cannot say it is random. Instead, one talks about a distribution over graphs and links it to counting. The distribution of random k-regular graphs, usually called , is then the uniform distribution over all k-regular graphs on n vertices. In other words, if there are many k-regular graphs then the probability of any one of them is .
If we want to show that we have a randomized algorithm that generates a random k-regular graph then we need to prove that given n and k as input, the algorithm proceeds to produce any specific k-regular graph on n vertices with a probability of . Or if we are willing to be happy with approximate algorithms then we want the probability to be approximately . If we cannot prove that our algorithm has this property then we should not say that it generates random k-regular graphs. Instead, we should just say it generates k-regular graphs from an unknown distribution. This is not desirable since we probably don’t know — nor have a good way to learn — the biases of this unknown distribution.
Thankfully, there are some straightforward algorithms for generating random k-regular graphs and Marcel Montrey has discussed them before on TheEGG.
How would we adapt the definition of random k-regular graphs to random power-law networks?
The issue is that ‘power-law’ does not specify a particular list of degrees. Instead, power-law specifies a distribution of degrees from which we can imagine the set of degrees of our particular graphs is sampled. Hence, the first step in thinking about a random power-law graph is to make a random power-law list of degrees. We can do this by picking the parameters of our power-law — which is not itself an unambiguous decision — and thus fixing a particular distribution .
We can then proceed to take n many samples , and call the resulting list as our list of degrees. We would have a k-regular graph is every , but obviously, this is an extremely unlikely event. But we can still finish the definition of a random power-law graph in the same way as a random regular graph. i.e. let us just take the uniform distribution over all graphs that have the degree list .
So to reiterate: a particular graph is random power-law graph with power-law if it has the same probability of being generated the same as the following two-stage process: (1) sample a degree list and (2) take the uniform distribution over all graphs with degree list .
This is where we hit our first hurdle: not every can be realized by some graph. Some degree lists are simply impossible.
Thankfully, there is a straightforward algorithm for checking if a given degree list is legal. In the case of simple graphs, one can use the Havel-Hakimi. But for simplicity, I want to focus on the case where multiple edges are allowed between a pair of vertices. In that case, we can use an alternative algorithm that I’ve described it before on the computer science StackExchange.
Create a complete graph on n vertices. For each vertex in , split it into copies. Split here means, create a number of copies with edges to every vertex has an edge to, but no edges to other copies of . If then simply remove the vertex. In the new graph, call these vertices for .
Once you are done, you have a very dense graph on vertices; call this graph . Pick your favorite algorithm for maximum matching (since the graph is so dense, you should probably use one of the fast matrix-multiplication based algorithms) and run it on . This will return a matching . If the matching is not perfect (i.e. if it does not cover every vertexes) then your degree distribution was impossible; so return no.
If you have a perfect matching , then remove all edges not in from , and then for every merge the many vertices $latex into one vertex . Merging two vertices means combining them into one, such that the resulting vertex has edges to every vertex at the original had an edge to. If two originals had an edge to the same vertex then the merged will have two edges between and .
Call the resulting graph ; it has the desired degree distribution.
The resulting runtime is where $\omega$ is the constant for the fastest matrix-multiplication algorithm (which at the time of writing is about 2.373). In terms of number of vertices in the resulting graph, in the worst case of degree distribution being dense, we have .
This gives a particular graph. But we wanted a random one.
The reason I presented the above algorithm instead of Havel-Hakimi is because I think it is relatively straightforward to modify into an algorithm for sampling random (not necessarily simple) power-law graphs.
In particular, all we need to do is replace our favorite algorithm for finding a perfect matching by an algorithm for sampling a perfect matching uniformly at random.
For this, you could use the classic algorithm of Jerrum & Sinclair (1989):
Jerrum, M., & Sinclair, A. (1989). Approximating the permanent. SIAM Journal on Computing, 18(6), 1149-1178.
Unfortunately, Jerrum-Sinclair algorithm relies on rappidly mixing Markov chains and thus only gives an approximation to uniform sampling. Perfect uniform sampling is probably impossible due to the correspondence between uniform sampling and counting, and the fact that counting the number of matchings is ♯P-complete.
Of course, this doesn’t mean that it is impossible to perfectly sample random power-law graphs. I mostly just wanted to provide the above sketch of an algorithm to show how finding an algorithm for the seemingly obvious question of generating a random power-law graphs quickly leads to some deep computer science. At least if we want precise and rigorous results.
The approach I sketch above is not very good. In particular, the limitation to non-simple graphs is a bit annoying. For better work, I would recommend the following papers:
Viger, F., & Latapy, M. (2005). Efficient and simple generation of random simple connected graphs with prescribed degree sequence. Computing and Combinatorics, 440-449.
Blitzstein, J., & Diaconis, P. (2011). A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Mathematics, 6(4), 489-522.
I think that people have even plaid with implementations of these algorithms in practical experimental settings:
Gkantsidis, C., Mihail, M., & Zegura, E. (2003). The Markov Chain simulation method for generating connected power law random graphs. In Proc. 5th Workshop on Algorithm Engineering and Experiments (ALENEX). (pdf)
As for the ideal of sampling a degree sequence from a power-law distribution and create a graph with that degree sequence. It seems that physicists have done some similar things non-rigorously:
Catanzaro, M., Boguñá, M., & Pastor-Satorras, R. (2005). Generation of uncorrelated random scale-free networks. Physical Review E, 71(2), 027103.
]]>The behind the scenes discussion building up to this launch was one of the motivators for my post on twitter vs blogs and science advertising versus discussion. And as you might expect, dear reader, it was important to me that this new community blog wouldn’t be just about science outreach and advertising of completed work. For me — and I think many of the editors — it is important that the blog is a place for science engagement and for developing new ideas in the open. A way to peel back the covers that hide how science is done and break the silos that inhibit a collaborative and cooperative atmosphere. A way to not only speak at the public or other scientists, but also an opportunity to listen.
For me, the blog is a challenge to the community. A challenge to engage in more flexible, interactive, and inclusive development of new ideas than is possible with traditional journals. While also allowing for a deeper, more long-form and structured discussion than is possible with twitter. If you’ve ever written a detailed research email, long discussion on Slack, or been part of an exciting journal club, lab meeting, or seminar, you know the amount of useful discussion that is foundational to science but that seldom appears in public. My hope is that we can make these discussions more public and more beneficial to the whole community.
Before pushing for the project, David made sure that he knew the lay of the land. He assembled a list of the existing blogs on computational and mathematical oncology. In our welcome post, I made sure to highlight a few of the examples of our community members developing new ideas, sharing tools and techniques, and pushing beyond outreach and advertising. But since we wanted the welcome post to be short, there was not the opportunity for a more thorough survey of our community.
In this post, I want to provide a more detailed — although never complete nor exhaustive — snapshot of the blogging community of computational and mathematical oncologists. At least the part of it that I am familiar with. If I missed you then please let me know. This is exactly what the comments on this post are for: expanding our community.
Here are the blogs alphabetically by primary author. As a disclaimer, I was not familiar with some of these blogs before David’s spreadsheet introduced them to me, and so my snapshots are incomplete. For each blog, I have tried to highlight a few posts that I’ve found particularly interesting. In some cases, these posts have spawned discussions on twitter or here on TheEGG or other blogs, and so I occasionally highlight those responses as well.:
Prior to the new computational and mathematical oncology blog launching, the Theory, Evolution, and Games Group encouraged posts from members of the mathematical oncology community and includes contributions on oncology from: Vincent Cannataro on dark selection from spatial cytokine signaling networks; Jill Gallaher on diversity working together: cancer, immune system, and microbiome; Philip Gerlee and Philipp Altrock ask is cancer really a game?; David Robert Grimes on oxygen fueling dark selection in the bone marrow; Dan Nichol on how evolutionary non-commutativity suggests novel treatment strategies; Rob Noble on cancer, bad luck, and a pair of paradoxes; Robert Vander Velde on cancer metabolism and voluntary public goods games, and ratcheting and the Gillespie algorithm for dark selection; and Matthew Wicker on identifying therapy targets & evolutionary potentials in ovarian cancer.
Although in the future, I will be directing mathonco writers and posting my own mathematical oncology contributions on the new blog. If you want to pitch a post idea for the new blog, please free to email me or chat with me in person if you’re in the Oxford area.
Finally, are there any blogs that I missed? Or any particularly exciting posts that I should have highlighted? Please let me know.
Maybe I should make a blogroll.
]]>