Generating random power-law graphs

‘Power-law’ is one of the biggest buzzwords in complexology. Almost everything is a power-law. I’ve even used it to sell my own work. But most work that deals in power-laws tends to lack rigour. And just establishing that something is a power-law shouldn’t make us feel that it is more connected to something else that is a power-law. Cosma Shalizi — the great critic of sloppy thinking in complexology — has an insightful passage on power-laws:

[T]here turn out to be nine and sixty ways of constructing power laws, and every single one of them is right, in that it does indeed produce a power law. Power laws turn out to result from a kind of central limit theorem for multiplicative growth processes, an observation which apparently dates back to Herbert Simon, and which has been rediscovered by a number of physicists (for instance, Sornette). Reed and Hughes have established an even more deflating explanation (see below). Now, just because these simple mechanisms exist, doesn’t mean they explain any particular case, but it does mean that you can’t legitimately argue “My favorite mechanism produces a power law; there is a power law here; it is very unlikely there would be a power law if my mechanism were not at work; therefore, it is reasonable to believe my mechanism is at work here.” (Deborah Mayo would say that finding a power law does not constitute a severe test of your hypothesis.) You need to do “differential diagnosis”, by identifying other, non-power-law consequences of your mechanism, which other possible explanations don’t share. This, we hardly ever do.

The curse of this multiple-realizability comes up especially when power-laws intersect with the other great field of complexology: networks.

I used to be very interested in this intersection. I was especially excited about evolutionary games on networks. But I was worried about some of the arbitrary seeming approaches in the literature to generating random power-law graphs. So before starting any projects with them, I took a look into my options. Unfortunately, I didn’t go further with the exploration.

Recently, Raoul Wadhwa has gone much more in-depth in his thinking about graphs and networks. So I thought I’d share some of my old notes on generating random power-law graphs in the hope that they might be useful to Raoul. These notes are half-baked and outdated, but maybe still fun.

Hopefully, you will find them entertaining, too, dear reader.

Read more of this post

Advertisements

Closing the gap between quantum and deterministic query complexity for easy to certify total functions

Recently, trying to keep with my weekly post schedule, I’ve been a bit strapped for inspiration. As such, I’ve posted a few times on a major topic from my past life: quantum query complexity. I’ve mostly tried to describe some techniques for (lower) bounding query complexity like the negative adversary method and span programs. But I’ve never really showed how to use these methods to actually set up interesting bounds.

Since I am again short of a post, I thought I’d share this week a simple proof of a bound possible with these techniques. This is based on an old note I wrote on 19 April 2011.

One of the big conjectures in quantum query complexity — at least a half decade ago when I was worrying about this topic — is that quantum queries give you at most a quadratic speedup over deterministic queries for total functions. In symbols: D(f) = O(Q^2(f)). Since Grover’s algorithm can give us a quadratic quantum speed-up for arbitrary total functions, this conjecture basically says: you can’t do better than Grover.

In this post, I’ll prove a baby version of this conjecture.

Let’s call a Boolean total-function easy to certify if one side of the function has a constant-length certificate complexity. I’ll prove that for easy-to-certify total functions, D(f) = O(Q^2(f)).

This is not an important result, but I thought it is a cute illustration of standard techniques. And so it doesn’t get lost in my old pdf, I thought I’d finally convert it to a blog post. Think of this as a simple application of the adversary method.

Read more of this post

The gene-interaction networks of easy fitness landscapes

Since evolutionary fitness landscapes have been a recurrent theme on TheEGG, I want to return, yet again, to the question of finding local peaks in fitness landscapes. In particular, to the distinction between easy and hard fitness landscapes.

Roughly, in easy landscapes, we can find local peaks quickly and in hard ones, we cannot. But this is very vague. To be a little more precise, I have to borrow the notion of orders of growth from the asymptotic analysis standard in computer science. A family of landscapes indexed by a size n (usually corresponding to the number of genes in the landscape) is easy if a local fitness optimum can be found in the landscapes in time polynomial in n and hard otherwise. In the case of hard landscapes, we can’t guarantee to find a local fitness peak and thus can sometimes reason from a state of perpetual maladaptive disequilibrium.

In Kaznatcheev (2019), I introduced this distinction to biology. Since hard landscapes have more interesting properties which are more challenging to theoretical biologist’s intuitions, I focused more on this. This was read — perhaps rightly — as me advocating for the existence or ubiquity of hard landscapes. And that if hard landscapes don’t occur in nature then my distinction is pointless. But I don’t think this is the most useful reading.

It certainly would be fun if hard landscapes were a feature of nature since they give us a new way to approach certain puzzles like the maintenance of cooperation, the evolution of costly learning, or open-ended evolution. But this is an empirical question. What isn’t a question is that hard landscape are a feature of our mental and mathematical models of evolution. As such, all — or most, whatever that means — fitness landscapes being easy is still exciting for me. It means that the easy vs hard distinction can push us to refine our mental models such that if only easy landscapes occur in nature then our models should only be able to express easy landscapes.

In other words, using computational complexity to build upper-bounds arguments (that on certain classes of landscapes, local optima can be found efficiently) can be just as fun as lower-bounds arguments (that on certain classes of landscapes, evolution requires at least a super-polynomial effort to find any local fitness peak). However, apart from a brief mention of smooth landscapes, I did not stress the upper-bounds in Kaznatcheev (2019).

Now, together with David Cohen and Peter Jeavons, I’ve taken this next step — at least in the cstheory context, we still need to write on the biology. So in this post, I want to talk briefly about a biological framing of Kaznatcheev, Cohen & Jeavons (2019) and the kind of fitness landscapes that are easy for evolution.

Read more of this post

Span programs as a linear-algebraic representation of functions

I feel like TheEGG has been a bit monotone in the sort of theoretical computer science that I’ve been writing about recently. In part, this has been due to time constraints and the pressure of the weekly posting schedule (it has now been over a year with a post every calendar week); and in part due to my mind being too fixated on algorithmic biology.

So for this week, I want to change things up a bit. I want to discuss some of the math behind a success of cstheory applied to nature: quantum computing. It’s been six years since I blogged about quantum query complexity and the negative adversary method for lower bounding it. And it has been close to 8 years since I’ve worked on the topic.

But I did promise to write about span programs — a technique used to reason about query complexity. So in this post, I want to shift gears to quantum computing and discuss span programs. I doubt this is useful to thinking about evolution, but it never hurts to discuss a cool linear-algebraic representation of functions.

I started writing this post for the CSTheory Community Blog. Unfortunately, that blog is largely defunct. So, after 6 years, I decided to post on TheEGG instead.

Please humour me, dear reader.

Read more of this post

Quick introduction: the algorithmic lens

Computers are a ubiquitous tool in modern research. We use them for everything from running simulation experiments and controlling physical experiments to analyzing and visualizing data. For almost any field ‘X’ there is probably a subfield of ‘computational X’ that uses and refines these computational tools to further research in X. This is very important work and I think it should be an integral part of all modern research.

But this is not the algorithmic lens.

In this post, I will try to give a very brief description (or maybe just a set of pointers) for the algorithmic lens. And of what we should imagine when we see an ‘algorithmic X’ subfield of some field X.

Read more of this post

From perpetual motion machines to the Entscheidungsproblem

Turing MachineThere seems to be a tendency to use the newest technology of the day as a metaphor for making sense of our hardest scientific questions. These metaphors are often vague and inprecise. They tend to overly simplify the scientific question and also misrepresent the technology. This isn’t useful.

But the pull of this metaphor also tends to transform the technical disciplines that analyze our newest tech into fundamental disciplines that analyze our universe. This was the case for many aspects of physics, and I think it is currently happening with aspects of theoretical computer science. This is very useful.

So, let’s go back in time to the birth of modern machines. To the water wheel and the steam engine.

I will briefly sketch how the science of steam engines developed and how it dealt with perpetual motion machines. From here, we can jump to the analytic engine and the modern computer. I’ll suggest that the development of computer science has followed a similar path — with the Entscheidungsproblem and its variants serving as our perpetual motion machine.

The science of steam engines successfully universalized itself into thermodynamics and statistical mechanics. These are seen as universal disciplines that are used to inform our understanding across the sciences. Similarly, I think that we need to universalize theoretical computer science and make its techniques more common throughout the sciences.

Read more of this post

Quick introduction: Problems and algorithms

For this week, I want to try a new type of post. A quick introduction to a standard topic that might not be familiar to all readers and that could be useful later on. The goal is to write a shorter post than usual and provide an launching point for future more details discussion on a topic. Let’s see if I can stick to 500 words — although this post is 933, so — in the future.

For our first topic, let’s turn to theoretical computer science.

There are many ways to subdivide theoretical computer science, but one of my favorite divisions is into the two battling factions of computational complexity and algorithm design. To sketch a caricature: the former focus on computational problems and lower bounds, and the latter focus on algorithms and upper bounds. The latter have counter-parts throughout science, but I think the former are much less frequently encountered outside theoretical computer science. I want to sketch the division between these two fields. In the future I’ll explain how it can be useful for reasoning about evolutionary biology.

So let’s start with some definitions, or at least intuitions.
Read more of this post

Open-ended evolution on hard fitness landscapes from VCSPs

There is often interest among the public and in the media about evolution and its effects for contemporary humans. In this context, some argue that humans have stopped evolving, including persons who have a good degree of influence over the public opinion. Famous BBC Natural History Unit broadcaster David Attenborough, for example, argued a few years ago in an interview that humans are the only species who “put halt to natural selection of its own free will”. The first time I read this, I thought that it seemed plausible. The advances in medicine that we made in the last two centuries mean that almost all babies can reach adulthood and have children of their own, which appears to cancel natural selection. However, after more careful thought, I realized that these sort of arguments for the ‘end of evolution’ could not be true.

Upon more reflection, there just seem to be better arguments for open-ended evolution.

One way of seeing that we’re still evolving is by observing that we actually created a new environment, with very different struggles than the ones that we encountered in the past. This is what Adam Benton (2013) suggests in his discussion of Attenborough. Living in cities with millions of people is very different from having to survive in a prehistoric jungle, so evolutionary pressures have shifted in this new environment. Success and fitness are measured differently. The continuing pace of changes and evolution in various fields such as technology, medicine, sciences is a clear example that humans continue to evolve. Even from a physical point of view, research shows that we are now becoming taller, after the effects of the last ice age faded out (Yang et al., 2010), while our brain seems to get smaller, for various reasons with the most amusing being that we don’t need that much “central heating”. Take that Aristotle! Furthermore, the shape of our teeth and jaws changed as we changed our diet, with different populations having a different structure based on the local diet (von Cramon-Taubadel, 2011).

But we don’t even need to resort to dynamically changing selection pressures. We can argue that evolution is ongoing even in a static environment. More importantly, we can make this argument in the laboratory. Although we do have to switch from humans to a more prolific species. A good example of this would be Richard Lenski’s long-term E-coli evolution experiment (Lenski et al., 1991) which shows that evolution is still ongoing after 50000 generations in the E-coli bacteria (Wiser et al., 2013). The fitness of the E. coli keeps increasing! This certainly seems like open-ended evolution.

But how do we make theoretical sense of these experimental observations? Artem Kaznatcheev (2018) has one suggestion: ‘hard’ landscapes due to the constraints of computational complexity. He suggests that evolution can be seen as a computational problem, in which the organisms try to maximize their fitness over successive generations. This problem would still be constrained by the theory of computational complexity, which tells us that some problems are too hard to be solved in a reasonable amount of time. Unfortunately, Artem’s work is far too theoretical. This is where my third-year project at the University of Oxford comes in. I will be working together with Artem on actually simulating open-ended evolution on specific examples of hard fitness landscapes that arise from valued constraint satisfaction problems (VCSPs).

Why VCSPs? They are an elegant generalization of the weighted 2SAT problem that Artem used in his work on hard landscapes. I’ll use this blog post to introduce CSPs, VCSPs, explain how they generalize weighted 2 SAT (and thus the NK fitness landscape model), and provide a way to translate between the language of computer science and that of biology.

Read more of this post

Minimal models for explaining unbounded increase in fitness

On a prior version of my paper on computational complexity as an ultimate constraint, Hemachander Subramanian made a good comment and question:

Nice analysis Artem! If we think of the fitness as a function of genes, interactions between two genes, and interactions between three genes and so on, your analysis using epistasis takes into account only the interactions (second order and more). The presence or absence of the genes themselves (first order) can change the landscape itself, though. Evolution might be able to play the game of standing still as the landscape around it changes until a species is “stabilized” by finding itself in a peak. The question is would traversing these time-dependent landscapes for optima is still uncomputable?

And although I responded to his comment in the bioRxiv Disqus thread, it seems that comments are version locked and so you cannot see Hema’s comment anymore on the newest version. As such, I wanted to share my response on the blog and expand a bit on it.

Mostly this will be an incomplete argument for why biologists should care about worst-case analysis. I’ll have to expand on it more in the future.

Read more of this post

Proximal vs ultimate constraints on evolution

For a mathematician — like John D. Cook, for example — objectives and constraints are duals of each other. But sometimes the objectives are easier to see than the constraints. This is certainly the case for evolution. Here, most students would point you to fitness as the objective to be maximized. And at least at a heuristic level — under a sufficiently nuanced definition of fitness — biologists would agree. So let’s take the objective as known.

This leaves us with the harder to see constraints.

Ever since the microscope, biologists have been expert at studying the hard to see. So, of course — as an editor at Proceedings of the Royal Society: B reminded me — they have looked at constraints on evolution. In particular, departures from an expected evolutionary equilibrium is where biologists see constraints on evolution. An evolutionary constraint is anything that prevents a population from being at a fitness peak.

Winding path in a hard semi-smooth landscape

In this post, I want to follow a bit of a winding path. First, I’ll appeal to Mayr’s ultimate-proximate distinction as a motivation for why biologists care about evolutionary constraints. Second, I’ll introduce the constraints on evolution that have been already studied, and argue that these are mostly proximal constraints. Third, I’ll introduce the notion of ultimate constraints and interpret my work on the computational complexity of evolutionary equilibria as an ultimate constraint. Finally, I’ll point at a particularly important consequence of the computational constraint of evolution: the possibility of open-ended evolution.

In a way, this post can be read as an overview of the change in focus between Kaznatcheev (2013) and (2018).
Read more of this post