Will the droids take academic jobs?

As a researcher, one of the biggest challenges I face is keeping up with the scientific literature. This is further exasperated by working in several disciplines, and without a more senior advisor or formal training in most of them. The Evolutionary Game Theory Reading Group, and later this blog, started as an attempt to help me discover and keep up with the reading. Every researcher has a different approach: some use the traditional process of reading the table of contents and abstracts of selective journals, others rely on colleagues, students, and social media to keep them up to date. Fundamentally, these methods are surprisingly similar and it is upto the individual to find what is best for them. I rely on blogs, G+, Google Scholar author alerts, extensive forward-citation searches, surveys, and most recently: Google Scholar updates.


The updates are a computer filtering system that uses my publications to gage my interests and then suggests new papers as they enter Google’s database. Due to my limited publication history, the AI doesn’t have much to go on, and is rather hit or miss. Some papers, like the Requejo & Camacho reference in the screeshot above, have led me to find useful papers on ecological games and hurry my own work on environmental austerity. Other papers, like Burgmuller’s, are completely irrelevant. However, this recommendation system will improve with time, I will publish more papers to inform it and the algorithms it uses will advance.

Part of that advancement comes from scientists optimizing their own lit-review process. Three days ago, Davis, Wiegers et al. (2013) published such an advancement in PLoS One. The authors are part of the team behind the Comparative Toxicogenomics Database that (emphasis mine):

promotes understanding about the effects of environmental chemicals on human health by integrating data from curated scientific literature to describe chemical interactions with genes and proteins, and associations between diseases and chemicals, and diseases and genes/proteins.

This curation requires their experts to go through thousands of articles in order update the database. Unfortunately, not every article is relevant and there are simply too many articles to curate everything. As such, the team needed a way to automatically sort which articles are most likely to be useful and thus curated. They developed a system that uses their database plus some hand-made rules to text-mine articles and assign them a document relevance score (DRS). The authors looked at a corpus of 14,904 articles, with 1,020 of them having been considered before and thus serving as a calibration set. To test their algorithms effectiveness, 3583 articles were sampled at random from the remaining 13884 and sent to biocurators for processing. The DRS correlated well with probability of curation, with 85% curation rate for articles with high DRS and only 15% for low DRS articles. This outperformed the PubMed ranking of articles which resulted in a ~60% curation rate regardless of the PubMed ranking.

With the swell of scientific publishing, I think machines are well on their way to replacing selective journals, graduate students, and social media for finding relevant literature. Throw in that computers already make decent journalists; and you can go to SCIgen to make your own AI authored paper that is ‘good’ enough to be accepted at an IEEE conference; and your laptop can write mathematical proofs good enough to fool humans. Now you have the ingredients to remind academics that they are at risk, just like everybody else, of losing their jobs to computers. Still it is tempting to take comfort from technological optimists like Andrew McAfee and believe that the droids will only reduce mundane and arduous tasks. It is nicer to believe that there will always be a place for human creativity and innovation:

For now, the AI systems are primarily improving my workflow and making researcher easier and more fun to do. But the future is difficult to predict, and I am naturally a pessimist. I like to say that I look at the world through algorithmic lenses. By that, I mean that I apply ideas from theoretical computer science to better understand natural or social phenomena. Maybe I should adopt a more literal meaning; at this rate “looking at the world through algorithmic lenses” might simply mean that the one doing the looking will be a computer, not me.

ResearchBlogging.orgDavis, A., Wiegers, T., Johnson, R., Lay, J., Lennon-Hopkins, K., Saraceni-Richards, C., Sciaky, D., Murphy, C., & Mattingly, C. (2013). Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database PLoS ONE, 8 (4) DOI: 10.1371/journal.pone.0058201