Software monocultures, imperialism, and weapons of math destruction

This past Friday, Facebook reported that they suffered a security breach that affected at least 50 million users. ‘Security breach’ is a bit of newspeak that is meant to hint at active malice and attribute fault outside the company. But as far as I understand it — and I am no expert on this — it was just a series of three bugs in Facebook’s “View As” feature that together allowed people to get the access tokens of whoever they searched for. This is, of course, bad for your Facebook account. The part of this story that really fascinated me, however, is how this affected other sites. Because that access token would let somebody access not only your Facebook account but also any other website where you use Facebook’s Single Sign On feature.

This means that a bug that some engineers missed at Facebook compromised the security of users on completely unrelated sites like, say, StackExchange (SE) or Disqus — or any site that you can log into using your Facebook account.

A case of software monoculture — a nice metaphor I was introduced to by Jonathan Zittrain.

This could easily have knock-on effects for security. For example, I am one of the moderators for the Theoretical Computer Science SE and also the Psychology and Neuroscience SE. Due to this, I have the potential to access certain non-public information of SE users like their IP addresses and hidden contact details. I can also send communications that look much more official, along-side expected abilities like bans, suspensions, etc. Obviously, part of my responsibility as a moderator is to only use these abilities for proper reasons. But if I had used Facebook — disclosure: I don’t use Facebook — for my SE login then a potential hacker could get access to these abilities and then attempt phishing or other attacks even on SE users that don’t use Facebook.

In other words, the people in charge of security at SE have to worry not only about their own code but also Facebook (and Google, Yahoo!, and other OpenIDs).

Of course, Facebook is not necessarily the worst case of software monoculture or knock-on effects that security experts have to worry about. Exploits in operating systems, browsers, serves, and standard software packages (especially security ones) can be even more devastating to the software ecology.

And exploits of aspects of social media other that login can have more subtle effects than security.

The underlying issue is a lack of diversity in tools and platforms. A case of having all our eggs in one basket. Of minimizing individual risk — by using the best available or most convenient system — at the cost of increasing systemic risk — because everyone else uses the same system.

We see the same issues in human projects outside of software. Compare this to the explanations of the 2008 financial crises that focused on individual vs systemic risk.

But my favourite example is the banana.

In this post, I’ll to sketch the analogy between software monoculture and agricultural monoculture. In particular, I want to focus on a common element between the two domains: the scale of imperial corporations. It is this scale that turns mathematical models into weapons of math destructions. Finally, I’ll close with some questions on if this analogy can be turned into tool transfer: can ecology and evolution help us understand and manage software monoculture?

Read more of this post

Five motivations for theoretical computer science

There are some situations, perhaps lucky ones, where it is felt that an activity needs no external motivation or justification.  For the rest, it can be helpful to think of what the task at hand can be useful for. This of course doesn’t answer the larger question of what is worth doing, since it just distributes the burden somewhere else, but establishing these connections seems like a natural part of an answer to the larger question.

Along those lines, the following are five intellectual areas for whose study theoretical computer science concepts and their development can be useful – therefore, a curiosity about these areas can provide some motivation for learning about those cstheory concepts or developing them. They are arranged from the likely more obvious to most people to the less so: technology, mathematics, science, society, and philosophy. This post could also serve as an homage to delayed gratification (perhaps with some procrastination mixed in), having been finally written up more than three years after first discussing it with Artem.

Read more of this post

Evolutionary non-commutativity suggests novel treatment strategies

In the Autumn of 2011 I received an email from Jacob Scott, now a good friend and better mentor, who was looking for an undergraduate to code an evolutionary simulation. Jake had just arrived in Oxford to start his DPhil in applied mathematics and by chance had dined at St Anne’s College with Peter Jeavons, then a tutor of mine, the evening before. Jake had outlined his ideas, Peter had supplied a number of email addresses, Jake sent an email and I uncharacteristically replied saying I’d give it a shot. These unlikely events would led me to where I am today — a DPhil candidate in the Oxford University Department of Computer Science. My project with Jake was a success and I was invited to speak at the 2012 meeting of the Society of Mathematical Biology in Knoxville, TN. Here I met one of Jake’s supervisors, Alexander Anderson, who invited me to visit the Department of Integrated Mathematical Oncology at the Moffitt Cancer Center and Research Institute for a workshop in December of that year. Here Dr. Anderson and I discussed one of the key issues with the work I will present in this post, issues that now form the basis of my DPhil with Dr. Anderson as one of two supervisors. Fittingly, the other is Peter Jeavons.

Jake was considering the problem of treating and avoiding drug resistance and in his short email provided his hypothesis as a single question: “Can we administer a sequence of drugs to steer the evolution of a disease population to a configuration from which resistance cannot emerge?”

In Nichol et al. (2015), we provide evidence for an affirmative answer to this question. I would like to use this post to introduce you to our result, and discuss some of the criticisms.

Read more of this post

Evolution is a special kind of (machine) learning

Theoretical computer science has a long history of peering through the algorithmic lens at the brain, mind, and learning. In fact, I would argue that the field was born from the epistemological questions of what can our minds learn of mathematical truth through formal proofs. The perspective became more scientific with McCullock & Pitts’ (1943) introduction of finite state machines as models of neural networks and Turing’s B-type neural networks paving the way for our modern treatment of artificial intelligence and machine learning. The connections to biology, unfortunately, are less pronounced. Turing ventured into the field with his important work on morphogenesis, and I believe that he could have contributed to the study of evolution but did not get the chance. This work was followed up with the use of computers in biology, and with heuristic ideas from evolution entering computer science in the form of genetic algorithms. However, these areas remained non-mathematical, with very few provable statements or non-heuristic reasoning. The task of making strong connections between theoretical computer science and evolutionary biology has been left to our generation.

ValiantAlthough the militia of cstheorists reflecting on biology is small, Leslie Valiant is their standard-bearer for the steady march of theoretical computer science into both learning and evolution. Due in part to his efforts, artificial intelligence and machine learning are such well developed fields that their theory branch has its own name and conferences: computational learning theory (CoLT). Much of CoLT rests on Valiant’s (1984) introduction of probably-approximately correct (PAC) learning which — in spite of its name — is one of the most formal and careful ways to understand learnability. The importance of this model cannot be understated, and resulted in Valiant receiving (among many other distinctions) the 2010 Turing award (i.e. the Nobel prize of computer science). Most importantly, his attention was not confined only to pure cstheory, he took his algorithmic insights into biology, specifically computational neuroscience (see Valiant (1994; 2006) for examples), to understand human thought and learning.

Like any good thinker reflecting on biology, Valiant understands the importance of Dobzhansky’s observation that “nothing in biology makes sense except in the light of evolution”. Even for the algorithmic lens it helps to have this illumination. Any understanding of learning mechanisms like the brain is incomplete without an examination of the evolutionary dynamics that shaped these organs. In the mid-2000s, Valiant embarked on the quest of formalizing some of the insights cstheory can offer evolution, culminating in his PAC-based model of evolvability (Valiant, 2009). Although this paper is one of the most frequently cited on TheEGG, I’ve waited until today to give it a dedicated post.
Read more of this post

Mathematics in finance and hiding lies in complexity

Sir Andrew Wiles

Sir Andrew Wiles

Mathematics has a deep and rich history, extending well beyond the 16th century start of the scientific revolution. Much like literature, mathematics has a timeless quality; although its trends wax and wane, no part of it becomes out-dated or wrong. What Diophantus of Alexandria wrote on solving algebraic equations in the 3rd century was still as true in the 16th, 17th, or today. In fact, it was in 1637 in the margins of Diophantus’ Arithmetica that Pierre de Fermat scribbled the statement of his Last Theorem. that the margin was too narrow to contain[1]. In modern notation it is probably one of the most famous Diophantine equations a^n + b^n = c^n with the assertion that it has no solutions for n > 2 and a,b,c as positive integers. A statement that almost anybody can understand, but one that is far from easy to prove or even approach[2].
Read more of this post

Programming language for biochemistry

Computer scientists that think of nature as literally computing, often take the stance that biological organisms are nothing more than protein interaction networks. For example, this is the stance that Leslie Valiant (2009) takes when defining ecorithms: biology is just a specialization of computer science focused on evolvable circuits. User @exploderator summarized the realist computational view of biology on Reddit while answering what theoretical computer science can offer biology:

[B]iology is primarily chemo-computation, chemical information systems and computational hardware.
Theoretical comp sci is the only field that is actually specifically dedicated to studying the mathematics / logic of computation. Therefore, although biology is an incredibly hard programming problem (only a fool thinks nature simple), it is indeed more about programming and less about the hardware it’s running on.

Although it is an easy stance for a theoretician to take, it is a little bit more involved for a molecular biologist, chemist, or engineer. Yet for the last 30 years, even experimentalists have been captivated by this computational realism and promise of engineering molecular devices (Drexler, 1981). Half a year ago, I even reviewed Bonnet et al. (2013) taking steps towards building transcriptors. They are focusing on the hardware side of biological computation and building a DNA-analogue of the von Neumann architecture. However, what we really need is a level of abstraction: a chemical programming language that can be compiled into biocompatible reactions.
Read more of this post

Semi-smooth fitness landscapes and the simplex algorithm

Leonid_KantorovichAs you might have guessed from my strange and complicated name, I’m Russian. One of the weird features of this is that even though I have never had to experience war, I still feel a strong cultural war-weariness. This stems from an ancestoral memory of the Second World War, a conflict that had an extremely disruptive affect on Russian society. None of my great-grandfathers survived the war; one of them was a train engineer that died trying to drive a train of provisions over the Road of Life to resuply Leningrad during its 29 month seige. Since the Germans blocked all the land routes, part of road ran over the ice on Lake Ladoga — trucks had to be optimally spaced to not crack the ice that separated them from a watery grave while maximizing the amount of supplies transported into the city. Leonid Kantorovich — the Russian mathematician and economist that developed linear programming as the war was starting in western Europe — ensured safety by calculating the optimal distance between cars depending on the ice thickness and air temperature. In the first winter of the road, Kantorovich would personally walk between trucks on the ice to ensure his guidelines were followed and to reassure the men of the reliability of mathematical programming. Like his British counterpart, Kantorovich was aplying the algorithmic lens to help the Allied war effort and the safety of his people. Although I can never reciprocate the heroism of these great men, stories like this convince me that the algorithmic lens can provide a powerful perspective in economics, engineering, or science. This is one of the many inspirations behind my most recent project (Kaznatcheev, 2013) applying the tools of theoretical computer science and mathematical optimization — such as linear programming — to better understand the rate of evolutionary dynamics.
Read more of this post

What can theoretical computer science offer biology?

If there is anything definitive that can be said about biology then it is that biology is messy and complicated. The stereotypical image of a biology major is in the library memorizing volumes of loosely (at best) related facts only to have them overturned in the next semester. Concepts are related only vaguely, to the point that it looks like stamp-collecting to outsiders. The only unifying theme is evolution, and even that is presented with a smorgasbord of verbal and mathematical models, with many lacking predictive power or sometimes even solid empirical foundations. This might seem like the polar opposite of a theoretical computer scientist with her strict hierarchies and logical deductions. Even the complexity zoo seems perfectly tame compared to any biological taxonomy. However, since I’ve been promoting algorithmic theories of biology and even trying my hand at applying cstheory to models of evolution (Kaznatcheev, 2013), I must think that there is some possibility for a productive partnership. So, what can theoretical computer science offer biology? CStheory can provide the abstractions and technical tools to systematize and organize biology’s heuristic models.
Read more of this post

Computational complexity of evolutionary equilibria

FoundersThe first half of the 20th century is famous for revolutionary changes — paradigm shifts as Kuhn would say — across the sciences and other systems of thought. Disheartened by the scars of the First World War, European thinkers sought refuge by shifting their worldviews away from those of their fathers. In fields like biology, this meant looking for guidance to your grandfathers, instead. The founders of the modern synthesis reconciled the fading ideas of Wallace’s natural selection with Mendelian genetics. In the process, they unified many branches of biology, that at the dawn of the 20th century had been diverging, into a single paradigm that persists today. A return to evolution by natural selection illuminated their field and ended the eclipse of Darwinism. At the same time, mathematicians questioned Hilbert’s formalism and Russell’s logicism as valid foundations for their own field. As a result, they formalized mechanistic calculation and logical thought as computation and founded theoretical computer science to study its power and limitations. Even though some pioneers — like Alan Turing — kept an eye on biology, the two streams of thought did not converge. The coming of the Second World War pushed both fields away from each other and deep foundational questions, entrenching them in applied and technological work.

For the rest of the 20th century, the fields remained largely independent. Computer science only looked to biology for vague inspiration of heuristics in the form of evolutionary computing (Holland, 1975), and biologists looked at computer science as an engineering or technical field that could only provide them with applied tools like bioinformatics. Neither field saw in the other a partner for addressing deep theoretical questions. As I mentioned previously, my goal is to remedy this by applying ideas from analysis of algorithms and computational complexity to fundamental questions in biology. Sometime in late October, I tweeted my joy at seeing evolution clearly through the algorithmic lens. On Friday the 23rd, after much delay, I turned this into an ArXiv preprint ambitiously titled “Complexity of evolutionary equilibria in static fitness landscapes“. The paper combines the discussion of evolution from my previous few blog posts with three simple technical results. Although it was written for theoretical computer scientists, I tried to make it accessible to mathematical biologists as well; the hope is to serve as a launching point for discussion between the two disciplines.
Read more of this post

NK and block models of fitness landscapes

On February 12, 2001, the human genome project released its first formal report. It was a great day for biology, and a wonderful birthday present for Darwin. However, it was also a humbling confrontation with complexity and prompted Steven Jay Gould to write in the New York Times:

[The Human Genome project revealed that] Home sapiens possesses between 30,000 and 40,000 … [only twice as many as the tiny roundworm] … The key to complexity is not more genes, but more combinations and interactions generated by fewer units of code … [their function] cannot be predicted from the separate underlying parts alone.

The one gene to one protein paradigm was overthrown, the 40k genes were simply not enough to generate the over 140k identified proteins in the human body. For evolutionary biology, the new view meant that changing one gene could affect (and would affect, in most cases) many proteins and thus have several different contributions to fitness. Biologists needed a model of fitness landscapes where the genes could interact with each other, allow for varying combinations, and manifest epistasis.
Read more of this post