A detailed update on readership for the first 200 posts

It is time — this is the 201st article on TheEGG — to get an update on readership since our 151st post and lament on why academics should blog. I apologize for this navel-gazing post, and it is probably of no interest to you unless you are really excited about blog statistics. I am writing this post largely for future reference and to celebrate this arbitrary milestone.

The of statistics in this article are largely superficial proxies — what does a view even mean? — and only notable because of how easy they are to track. These proxies should never be used to seriously judge academics but I do think they can serve as a useful self-tracking tool. Making your blog’s statistics available publicly can be a useful comparison for other bloggers to get an idea of what sort of readership and posting habits are typical. In keeping with this rough and lighthearted comparison, according to Jeromy Anglim’s order-of-magnitude rules of thumb, in the year since the last update the blog has been popular in terms of RSS subscribers and relatively popular in terms of annual page views.

As before, I’ll start with the public self-metrics of the viewership graph for the last 6 and a half months:

Columns are views per week at TheEGG blog since the end of August, 2014. The vertical lines separate months, and the black line is average views per day for each month. The scale for weeks is on the left, it is different from the scale for daily average, those are labeled at each height.

Columns are views per week at TheEGG blog since the end of August, 2014. The vertical lines separate months, and the black line is average views per day for each month. The scale for weeks is on the left, it is different from the scale for daily average, those are labeled at each height.

If you’d like to know more, dear reader, then keep reading. Otherwise, I will see you on the next post!

Unfortunately, the above graph does not go all the way to the previous update of March 18th, 2014. This is partially because I forgot to take screenshots of the stats page and because there are only 7 posts in the period of March 25th – May 4th followed by a nearly 4 month silence until my apology on August 29th.

The history of all the views to (blue) and words written on (red) TheEGG since the blog's start on September 1st, 2011. The 'x's on the red line correspond to the publication of posts.

The history of all the views to (blue) and words written on (red) TheEGG since the blog’s start on September 1st, 2011. The ‘x’s on the red line correspond to the publication of posts.

If you really want to see a historical time series the refer to the figure at right of the views and total words in posts going back to the start of the blog. You might notice that shortly after my hiatus, the total views finally surpassed the cumulative words. Here the figure might be misleading you slightly since the views are to all pages and the front-page/archive while the words exclude pages. I excluded the pages because their content is less original; for instance, the references page has over 24 thousand words but is a purely navigational tool with no added content over the original posts it links to. If we add the pages then the total words on TheEGG has just passed 300 thousand.

Top posts

Given that TheEGG’s archive is growing sizable, I created an about & highlights pages that breaks down the articles into 7 rough themes: (i) Algorithmic theory of biology; (ii) Bounded rationality in economics and finance; (iii) Cognitive science and philosophy of mind; (iv) Evolutionary game theory; (v) Mathematical oncology and theoretical biology; (vi) Metamodeling and philosophy of science, (vii) Theoretical computer science and machine learning. For each section I selected 3 articles, based mostly on viewership, to give a new visit about a 10% sample of posts to start with. However, in keeping with the tradition of my previous updates, below is the list the top 10% of the posts on the blog by total viewership; 7 are from the last 50, and 13 return from the top 15 of the previous update.

  1. Defining empathy, sympathy, and compassion (19,471)
  2. Critical thinking and philosophy (17,821)
  3. Machine learning and prediction without understanding (10,676)
  4. Three types of mathematical models (8,409)
  5. Hunger Games themed semi-iterated prisoner’s dilemma tournament (8,328)
  6. Models and metaphors we live by (6,606)
  7. Toward an algorithmic theory of biology (6,573)
  8. Should we be astonished by the Principle of “Least” Action? by Abel Molina (5,823)
  9. Software through the lens of evolutionary biology (5,658)
  10. Evolution is a special kind of (machine) learning (5,484)
  11. Transcendental idealism and Post’s variant of the Church-Turing thesis (5,327)
  12. Micro-vs-macro evolution is a purely methodological distinction (5,127)
  13. Bounded rationality: systematic mistakes and conflicting agents of mind (4,812)
  14. Monoids, weighted automata and algorithmic philosophy of science (4,613)
  15. Is Chaitin proving Darwin with metabiology? (4,430)
  16. Weapons of math destruction and the ethics of Big Data (3,972)
  17. Are all models wrong? (3,859)
  18. Four color problem, odd Goldbach conjecture, and the curse of computing (3,448)
  19. Programming language for biochemistry (3,445)
  20. Personification and pseudoscience (3,397)

Since this is the fourth update on readership, we can try to track some trends on the total views and top 10% from the previous updates on February 5, 2013; August 4, 2013; and March 18, 2014. The total views went from 16,751 to 79,722 to 159,455 to 316,565; the views threshold for being in the top 10% went from 559 (1.67x the approximate mean) to 1,633 (2.05x) to 2,152 (2.02x) to 3,397 (2.15x); the views for the top post went from 2,863 (8.55x) to 7,200 (9.03x) to 8,086 (7.61x) to 19,471 (12.3x). The fraction of all post views (excluding the front page) to the top 10% of the posts went from less than ~47% (I did not exclude the front page in the first stats update, so can only estimate) to 51% to 46% to 48%. I don’t know what any of these numbers mean, but hey: numbers!

Guests and contributors

What I am most excited about is seeing Abel’s post in the top 10% of viewership. TheEGG was originally intended as an group blog (hence the last G standing for group), and I have worked hard to get authors other than myself to contribute writing. As you can see on the sidebar, TheEGG has content from 15 authors and they’ve contributed from all over the globe:

TheEGGmap

In the last year we’ve had posts from the following fine folks:

Percentage of the total article words on TheEGG written by Artem Kaznatcheev since the start of the blog. Boxes represent new posts.

Percentage of the total article words on TheEGG written by Artem Kaznatcheev since the start of the blog. Boxes represent new posts.

In numbers, this means that 16% of the last 50 posts were not written by me. A good start, but I need to encourage other bloggers to contribute more to make the blog a group effort. On the right, you can see a graph of the proportion of the blog’s words (excluding static pages) written by me. As you can see, there was a point in late 2011 and 2012 when the majority of the words weren’t written by me. In fact, for a little bit the majority of the words were written by Julian Xue, and then the blog became more pluralized with contributions from Julian, Marcel, Tom, and me. Unfortunately, Julian has been too busy to contribute a post since October 13, 2011 but I am happy to tease that he is currently working on a couple of drafts for TheEGG.

Since the start of 2012, however, there has been a relatively steady increase in the fraction of the blog written by me. It seems to have now stabilized at around 82%. Hopefully in the next 50 posts, we can push this down to 75%. If you think that your writing would be a good fit for TheEGG then contact me and let me know, and we can see if it fits the blog’s vision and standards.

Post properties

Cummulative distribution of views (red) and words (blue) for the 200 posts on TheEGG. Median and mean for views and words are shown with vertical lines; an intersecting horizontal line is added to read off the mean post's percentile.

Cummulative distribution of views (red) and words (blue) for the 200 posts on TheEGG. Median and mean for views and words are shown with vertical lines; an intersecting horizontal line is added to read off the mean post’s percentile.

Above we have a lot of statistics about the blog overall, but not much on individual posts. From having written most of the posts, I have a vague feeling of what the ‘typical’ post looks like, but I decided to also calculate some statistics. At the right is the graph of two cumulative distributions, in red is views and in blue is words. The graph is truncated at 2500 views/words, because plotting the outliers would obscure the focus on the typical post.

The two vertical blue lines correspond to the median and mean of the distribution of words. The mean is slightly higher than the median; from reading the horizontal blue intercept, we can see that the mean is at the 52nd percentile and around 1393 words; the median length post is “False memories and journalism” at 1358 words. This close correspondence between mean and median is not surprising because I explicitly aim to have posts in the range of 1k to 1.5k words. Most excessively long posts, I try to split them into self-contained parts.

Viewership, however, is much more skewed. The two vertical red lines correspond to the median and mean of the distribution of views. The mean is far higher than the median. From reading the horizontal red intercept, we can see that the mean is just past the 75th percentile — 75% percent of the posts have fewer views than average. In terms of views, the median post is the transcript of my old TEDxMcGill Talk on evolving cooperation with 681 views. A mean post however, has around 1409 views similar to the post “Game theoretic analysis of motility in cancer metastasis“.

This skew of a small fraction of posts bringing in most views is not too surprising given how most traffic arrives on TheEGG. Of all the referrals to the blog, Reddit is by far the most prominent; it brings in around 97k views, with the most common source subreddits (other than the front page) being /r/programming (5.5k), /r/math (5.2k), /r/compsci (3.8k), and /r/philosophy (3.7k). I am a little surprised not to see my favorite subreddit /r/PhilosophyofScience in that list. The second most common source of traffic is searches, with around 37k views. If you are curious, the most common search terms are variants on ‘types of mathematical models‘, ‘spatial structure‘, and ‘metabiology‘. Social media follows in third with G+ at 4.7k, Twitter at 4.3k, and Facebook at 4.1k.

Unsurprisingly, more recent posts on the blog have higher total readership than older ones. I like to imagine that this is due to a gradual increase in the quality of content, but it might be just time and better promotion. In the left panel below, there is the total views to each post to date versus the time it was published. Note that the views scale is logarithmic with base 10. The fit is an exponential, which has a doubling time of around 1.5 years.

Two views of total views per post on the blog. On the left, is the total views to date versus the date the post was published, on the right is the total views to date versus the length of the post in words. Note that the views axis is scaled as log base 10.

Two views of total views per post on the blog. On the left, is the total views to date versus the date the post was published, on the right is the total views to date versus the length of the post in words. Note that the views axis is scaled as log base 10.

The panel at right, has views versus words. This is mostly to entertain David Basanta’s question on if longer posts garner a smaller readership. It seems that upto around 1k words, there is an increase in readership; this is probably because it is hard to express an interesting idea if less than 1k words. After about 1k, lengths seem to not affect the viewrship much; although maybe I should test this more carefully. It is, however, possible to do some fitting to find the ‘sweet-spot’ for wordlength and depending on which method I use, it is somewhere between 1860 and 2070 words. For comparison, this post (which, for obvious reason is not included in these stats) is 3021 words long and will probably be viewed less than 302 times in the next year.

Cumulative distribution of days since last post.

Cumulative distribution of days since last post.

Finally, on the regularity of the blog. Although there are some notable long gaps between posts, it seems that the typical (i.e. median gap is about half a week). In particular, over 53% of the posts come 3 days or less after their predecessor. My personal goal is to post at least once every 7 days; this resolution is violated less than 14% of the time. For example, so far this year there has only been one gap of over a week — a two week gap while I prepared the last post on pairing tools and problems. At the right, you can see a cumulative distribution of the post delay times. The distribution is truncated and does not display the 5 times TheEGG had a silence of greater than a month.

Community

Of course, the most rewarding and important part of the blog is not the easy to track properties like view and word counts, but the community of readers. It is tempting to take the 2,417 followers that TheEGG has on WordPress and via email as a measure of the size of community, but I think that would be an unreasonable overestimate. A lot of the WordPress accounts that follow this blog, for instance, seem to be small businesses of various products/services that have nothing to do with the content of TheEGG. My pessimism leads me to suspect there is some SEO optimization that is being exploited here, or maybe a convention that I am unaware of (and thus don’t adhere to) of following back the people that follow you? I’ve tried to keep track manually of the people that engage with TheEGG on social media and through commenting, by creating a G+ circle of engagers. A lot of engagement that I see comes from twitter, and a lot of tweeps don’t have G+ accounts so this circle is an underestimate of the community. If you are a regular reader or occasional sharer or commenter and I forgot to include you in the circle then let me know and I will add you to it; that way other readers can find you!

Cumulative distribution of comments on (blue) and pingbacks to (red)  posts on TheEGG.

Cumulative distribution of comments on (blue) and pingbacks to (red) posts on TheEGG.

A better metric for community might be the number of comments on the blog, since that reveals the readers that are passionate enough to write responses and suggestions. Here, the statistics don’t look too impressive. There are 701 comments, but 225 of these are by me, and 989 pingbacks (with most being internal to the blog). On the right is the cumulative distribution for the comments per posts in blue and pingbacks per post in red. On 33.5% of the posts, there is no comments, and on 57.5% of the posts there is 2 or fewer comments — often this is a single reader’s comment with my response.

The most commented on post is “Kooky history of the quantum mind” with 27 comments. The discussion on that post was incredibly useful to me, in that it lead me to learning about Hoffman’s interface theory of perception — a find that has been directly useful to my research. That thread has also gotten me in a bit of trouble in a later email discussion with Hoffman. I was very critical when I first saw Hoffman’s talk, mistaking parts of it for new-age pseudoscience — coincidentally, the most commented on post in the year since the last stats update was my post on pseudoscience with 20 comments — and expressed that sentiment in the comments. My comment was then picked up by a (popular?) skeptic forum and eventually reached Hoffman. So yes, dear reader, there are some dangers to blogging, but they are greatly outweighted by the benefits. Although I fear that my harsh commenting might be driving off some potential interlocutors, so I am trying to work on my tone in discussion.

The post with the most pingbacks at 29 (and also with the third most number of comments at 18) is on the three types of mathematical models. This is expected, since I often link to that post for definitions of heuristics and insilications. Maybe I should create a glossary of terms.

The most memorable experiences with the community or readers, however, are qualitative ones. For example, the twitter and email reaction to my post on bernstein polynomials and the public good that lead to the guest post by Philip Gerlee and Phillip Altrock. Or when a fellow research that I just met mentions that they read the blog or pull up a copy of a post from their EndNote.

I was also extremely honored last year when TheEGG was listed as one of the top 30 computer science and programming blogs alongside wonderful blogs that I often frequent. I was particularly excited about their summary of the content here:

This blog weaves together computer science, the theory of evolution, and game theory into a masterpiece of interdisciplinary research.

Hopefully I can maintain this site as a space worth visiting.

Going forward

As I’ve mentioned earlier in this article, one of my goals is to introduce more contributors to the blog and feature more writing from other researchers. I also feel like I have fallen behind in my reading and commenting on others’ blogs, and so will aim to engage more with the blog-o-sphere to widen the community beyond this site. Obviously, this has to be balanced against my other commitments. Although this blog is a part of my research workflow, it is only a small part and one that is not particularly rewarded by the traditional academy.

How would you like to see TheEGG develop, dear reader?

Advertisements

About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

4 Responses to A detailed update on readership for the first 200 posts

  1. Jon Awbrey says:

    Speaking of naval gazing, there’s an issue I’ve been concerned with for a very long time, since the first round of the Perceptron debates, in fact. I have struggled over the years to pinpoint the crux of the problem, so this will be just one more try.

    More attention needs to be paid to the difference between, or the dimension that stretches between two types of models. There are mathematical models that depend very heavily on high precision real number computation and there are models that depend very heavily on complex graph-theoretic data structures. Everything we see in practice will do some numerical computation and some pointer manipulation, of course, so it’s a matter of arraying models according to how they represent the state of the object system, how they store the lion’s share of information about the state and how they compute the state evolution.

    To be continued …

  2. Pingback: Cataloging a year of blogging | Theory, Evolution, and Games Group

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s