Generating random power-law graphs

‘Power-law’ is one of the biggest buzzwords in complexology. Almost everything is a power-law. I’ve even used it to sell my own work. But most work that deals in power-laws tends to lack rigour. And just establishing that something is a power-law shouldn’t make us feel that it is more connected to something else that is a power-law. Cosma Shalizi — the great critic of sloppy thinking in complexology — has an insightful passage on power-laws:

[T]here turn out to be nine and sixty ways of constructing power laws, and every single one of them is right, in that it does indeed produce a power law. Power laws turn out to result from a kind of central limit theorem for multiplicative growth processes, an observation which apparently dates back to Herbert Simon, and which has been rediscovered by a number of physicists (for instance, Sornette). Reed and Hughes have established an even more deflating explanation (see below). Now, just because these simple mechanisms exist, doesn’t mean they explain any particular case, but it does mean that you can’t legitimately argue “My favorite mechanism produces a power law; there is a power law here; it is very unlikely there would be a power law if my mechanism were not at work; therefore, it is reasonable to believe my mechanism is at work here.” (Deborah Mayo would say that finding a power law does not constitute a severe test of your hypothesis.) You need to do “differential diagnosis”, by identifying other, non-power-law consequences of your mechanism, which other possible explanations don’t share. This, we hardly ever do.

The curse of this multiple-realizability comes up especially when power-laws intersect with the other great field of complexology: networks.

I used to be very interested in this intersection. I was especially excited about evolutionary games on networks. But I was worried about some of the arbitrary seeming approaches in the literature to generating random power-law graphs. So before starting any projects with them, I took a look into my options. Unfortunately, I didn’t go further with the exploration.

Recently, Raoul Wadhwa has gone much more in-depth in his thinking about graphs and networks. So I thought I’d share some of my old notes on generating random power-law graphs in the hope that they might be useful to Raoul. These notes are half-baked and outdated, but maybe still fun.

Hopefully, you will find them entertaining, too, dear reader.

What is a good definition of a random power-law graph? This is a difficult question because it has many words that seem like they are precisely defined but aren’t. Let’s start with ‘random’. This is extremely vague, but in practice, it is often used to mean something like ‘unbiased’ or ‘unstructured’ or ‘unparticular’. Of course, it can never really mean that since randomness is a great way to generate structure. But the best we can do to work towards ‘unbiased’ is to take a look at what we mean with existing more cleared defined objects like random k-regular graphs.

So let’s look at how random is defined here.

First, what is a k-regular graph? It a graph on n vertices where each vertex has exactly k neighbours. These don’t always exist, so for simplicity let us suppose that 3 \leq k \leq n and that kn is even. If we wanted to look at the degree distribution of such a graph then it would be zero everywhere except for a mass of 1 at k.

Now, what does it mean to be a random k-regular graph? Given any particular graph, one cannot say it is random. Instead, one talks about a distribution over graphs and links it to counting. The distribution of random k-regular graphs, usually called \mathcal{G}_{n,k}, is then the uniform distribution over all k-regular graphs on n vertices. In other words, if there are M_{n,k} many k-regular graphs then the probability of any one of them is \frac{1}{M_{n,k}}.

If we want to show that we have a randomized algorithm that generates a random k-regular graph then we need to prove that given n and k as input, the algorithm proceeds to produce any specific k-regular graph on n vertices with a probability of \frac{1}{M_{n,k}}. Or if we are willing to be happy with approximate algorithms then we want the probability to be approximately \frac{1}{M_{n,k}}. If we cannot prove that our algorithm has this property then we should not say that it generates random k-regular graphs. Instead, we should just say it generates k-regular graphs from an unknown distribution. This is not desirable since we probably don’t know — nor have a good way to learn — the biases of this unknown distribution.

Thankfully, there are some straightforward algorithms for generating random k-regular graphs and Marcel Montrey has discussed them before on TheEGG.

How would we adapt the definition of random k-regular graphs to random power-law networks?

The issue is that ‘power-law’ does not specify a particular list of degrees. Instead, power-law specifies a distribution of degrees from which we can imagine the set of degrees of our particular graphs is sampled. Hence, the first step in thinking about a random power-law graph is to make a random power-law list of degrees. We can do this by picking the parameters of our power-law — which is not itself an unambiguous decision — and thus fixing a particular distribution D_P.

We can then proceed to take n many samples d_i \approx D_P, and call the resulting list d_1, ..., d_n as our list of degrees. We would have a k-regular graph is every d_i = k, but obviously, this is an extremely unlikely event. But we can still finish the definition of a random power-law graph in the same way as a random regular graph. i.e. let us just take the uniform distribution over all graphs that have the degree list (d_1, ..., d_n).

So to reiterate: a particular graph is random power-law graph with power-law D_P if it has the same probability of being generated the same as the following two-stage process: (1) sample a degree list d_1,...,d_n \approx D_P and (2) take the uniform distribution over all graphs with degree list (d_1,...,d_n).

This is where we hit our first hurdle: not every (d_1,...,d_n) can be realized by some graph. Some degree lists are simply impossible.

Thankfully, there is a straightforward algorithm for checking if a given degree list is legal. In the case of simple graphs, one can use the Havel-Hakimi. But for simplicity, I want to focus on the case where multiple edges are allowed between a pair of vertices. In that case, we can use an alternative algorithm that I’ve described it before on the computer science StackExchange.

Create a complete graph K_n on n vertices. For each vertex v_i in K_n, split it into d_i copies. Split here means, create a number of copies with edges to every vertex v_i has an edge to, but no edges to other copies of v_i. If d_i = 0 then simply remove the vertex. In the new graph, call these vertices v_{ij} for 1 \leq j \leq d_i.

Once you are done, you have a very dense graph on N = d_1 + ... + d_n vertices; call this graph H. Pick your favorite algorithm for maximum matching (since the graph is so dense, you should probably use one of the fast matrix-multiplication based algorithms) and run it on H. This will return a matching M. If the matching is not perfect (i.e. if it does not cover every vertexes) then your degree distribution was impossible; so return no.

If you have a perfect matching M, then remove all edges not in M from H, and then for every 1 \leq i \leq n merge the d_i many vertices $latex v_{i1}, ... , v_{id_i} into one vertex u_i. Merging two vertices means combining them into one, such that the resulting vertex has edges to every vertex at the original had an edge to. If two originals had an edge to the same vertex u_l then the merged u_i will have two edges between u_i and u_l.

Call the resulting graph G; it has the desired degree distribution.

The resulting runtime is O(N^\omega) where $\omega$ is the constant for the fastest matrix-multiplication algorithm (which at the time of writing is about 2.373). In terms of number of vertices in the resulting graph, in the worst case of degree distribution being dense, we have O(n^{2\omega}).

This gives a particular graph. But we wanted a random one.

The reason I presented the above algorithm instead of Havel-Hakimi is because I think it is relatively straightforward to modify into an algorithm for sampling random (not necessarily simple) power-law graphs.

In particular, all we need to do is replace our favorite algorithm for finding a perfect matching by an algorithm for sampling a perfect matching uniformly at random.

For this, you could use the classic algorithm of Jerrum & Sinclair (1989):

Jerrum, M., & Sinclair, A. (1989). Approximating the permanent. SIAM Journal on Computing, 18(6), 1149-1178.

Unfortunately, Jerrum-Sinclair algorithm relies on rappidly mixing Markov chains and thus only gives an approximation to uniform sampling. Perfect uniform sampling is probably impossible due to the correspondence between uniform sampling and counting, and the fact that counting the number of matchings is ♯P-complete.

Of course, this doesn’t mean that it is impossible to perfectly sample random power-law graphs. I mostly just wanted to provide the above sketch of an algorithm to show how finding an algorithm for the seemingly obvious question of generating a random power-law graphs quickly leads to some deep computer science. At least if we want precise and rigorous results.

The approach I sketch above is not very good. In particular, the limitation to non-simple graphs is a bit annoying. For better work, I would recommend the following papers:

Viger, F., & Latapy, M. (2005). Efficient and simple generation of random simple connected graphs with prescribed degree sequence. Computing and Combinatorics, 440-449.

Blitzstein, J., & Diaconis, P. (2011). A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Mathematics, 6(4), 489-522.

I think that people have even plaid with implementations of these algorithms in practical experimental settings:

Gkantsidis, C., Mihail, M., & Zegura, E. (2003). The Markov Chain simulation method for generating connected power law random graphs. In Proc. 5th Workshop on Algorithm Engineering and Experiments (ALENEX). (pdf)

As for the ideal of sampling a degree sequence from a power-law distribution and create a graph with that degree sequence. It seems that physicists have done some similar things non-rigorously:

Catanzaro, M., Boguñá, M., & Pastor-Satorras, R. (2005). Generation of uncorrelated random scale-free networks. Physical Review E, 71(2), 027103.

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

2 Responses to Generating random power-law graphs

  1. Philip Gerlee says:

    The work of Svante Janson might be of interest e.g.

    • Thanks for the link! It is indeed very relevant to this post since it fixes the non-simple graph problem in my sketch. I will have to read it closely. I am glad to see that this field is active.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: