Big data, prediction, and scientism in the social sciences

Much of my undergrad was spent studying physics, and although I still think that a physics background is great for a theorists in any field, there are some downsides. For example, I used to make jokes like: “soft isn’t the opposite of hard sciences, easy is.” Thankfully, over the years I have started to slowly grow out of these condescending views. Of course, apart from amusing anecdotes, my past bigotry would be of little importance if it wasn’t shared by a surprising number of grown physicists. For example, Sabine Hossenfelder — an assistant professor of physics in Frankfurt — writes in a recent post:

If you need some help with the math, let me know, but that should be enough to get you started! Huh? No, I don't need to read your thesis, I can imagine roughly what it says.It isn’t so surprising that social scientists themselves are unhappy because the boat of inadequate skills is sinking in the data sea and physics envy won’t keep it afloat. More interesting than the paddling social scientists is the public opposition to the idea that the behavior of social systems can be modeled, understood, and predicted.

As a blogger I understand that we can sometimes be overly bold and confrontational. As an informal medium, I have no fundamental problem with such strong statements or even straw-men if they are part of a productive discussion or critique. If there is no useful discussion, I would normally just make a small comment or ignore the post completely, but this time I decided to focus on Hossenfelder’s post because it highlights a common symptom of interdisciplinitis: an outsider thinking that they are addressing people’s critique — usually by restating an obvious and irrelevant argument — while completely missing the point. Also, her comments serve as a nice bow to tie together some thoughts that I’ve been wanting to write about recently.

In Hossenfelder’s case, they point she is missing — and inadvertently illustrating — is the danger of scientism in the social sciences. The lesser danger is belittling the methods used by social scientists. It is not uncommon for physicists or mathematicians to bring in heavy-duty mathematical tools and argue that they should be listened to because their tools are fancy; relevance by intimidations. Of course, sometimes these tools can prove useful and alter the landscape of the fields they are introduced into, but most of the time they either disappear, or form a methological ghetto. At times these ghettos grow into subfields of their own that develop nearly independent of the discipline they wanted to have an effect on, like econophysics and network science.

socialNetworks_citationsA great illustration of this is the citation patterns in the Small World networks literature (from Freeman, 2004) on the left. Papers by sociologists are represented with white dots, those by physicists in black, and others are in grey. It is easy to see that there are two distinct clusters that barely communicate with each other. I am not sure how such segmentation is productive, since unless the physicists just transition completely to being sociologists, they are not actually moving the study of society forward because their work remains unknown or unimportant to practicing sociologists. This is why I believe that if you are entering a new field then you should do so as a connector: try your best to make any new tools you bring as simple and well justified as possible, and make sure you understand exactly what the problems the fields wants to answer are instead of imagining your own.

Methodological intimidation concerns only the dynamics of fellow scientists and is thus of minor importance. The real issue is with using the undue authority of science to drive social change. If you don’t think scientific has authority in society then just look at peddlers of homeopathic medicine, who appeal to the authority of ‘scientifically tested’ to sell their products. This sort of Scientism is what people are fearing and critiquing when they cry ‘social engineering!’ at the end of physicists’ posts on social issues. This is also the point that Hossenfelder misses completely when she writes (emphasis mine):

the only way we can solve the problems that mankind faces today — the global problems in highly connected and multi-layered political, social, economic and ecological networks — is to better understand and learn how to improve the systems that govern our lives.

This is the real point of people who are opposed to physicists naive views, and the point most frequently missed by physicists climbing into the social sciences. As Cathy O’Neil writes in the context of economics: “actual scientists are skeptical, even of their own work, and don’t pretend to have error bars small enough to make high-impact policy decisions based on their fragile results.” Instead of following this motto, being ‘scientific’ is often used as a show of power in our society, and if you climb into a poorly understood system and start “making it better”, you are likely to make it awful. Especially if you are have no respect for human autonomy or dignity (regardless of your views on free will). If you want to be scientific then you should focus on understanding the system you study, not changing it. Kepler, Galileo, and Newton didn’t aim to improve the paths of the wandering stars, just to understand them. Only after a level of understanding was achieved, did a derived engineering develop.

For something as important as social policy, you should first have a reasonable understanding of the system you are trying to affect before you try to turn to scientism to support your pet policies. In the process of expanding your understanding, however, you should not expect more information to help converge political opinion. In fact “the more information partisans get, the deeper their disagreements become” (for details see Kahan et al., 2013). Yet among liberals a common joke that “truth has a liberal bias” continues and an assumption that more data, especially the trendy (and overhyped) ‘big data’, will create ‘good’ social policy is prevalent. This, of course, ignores the actual bias present in data from how we collect it, process it, and choose to act on its predictions.

Even in non-partizan settings where ‘improvement’ is unambigious, such as Google Flu Trend (Ginsberg et al., 2009), it is easy to see the limits of prediction from big data. Google Flu Trends started with great success but in a few years was much less effective than the more traditional predictions by the Centers for Disease Control and Prevention that it was meant to surpass. Lazer et al. (2014) identified two primary problems: big data hubris and algorithmic dynamics. The first is a belief “that big data are a substitute for, rather than a supplement to, traditional data collection and analysis” (Lazer et al. 2014, pg. 1203). For me, this is one of the main differences between big data for social sciences versus data in more traditional observational sciences like astronomy. For astronomy, the type of data collected and the sort of questions asked are shaped by the scientists themselves. In the social sciences, however, the questions are often imposed from the outside (either from folk theories or from policy demands) and the big data is collected with other questions (usually related to the interests of a specific company) in mind. This makes the interpretation, analysis, and reproduction fundamentally different between the fields. Only in exceptional cases is sound experimental/measurement design and machine learning combined.

Algorithmic dynamics further exasperate the problem for big data. Since the data collection is often outside of the researchers hands and controlled by an opaque entity that has specific interests in processes generating the data. In the case of Google Flu trends, for instance, countless improvements to results (such as suggested searches) made by the search team made it very difficult for the prediction team to generate accurate predictions because their measurement apparatus (the volume of specific search queries) was constantly changing without their knowledge (although both teams are in the same company, they are not in close contact). This makes the ‘raw data’ impossible to regenerate and replication or reanalysis by other teams and the resulting critical dialogue is something unimaginable. Compare this to astronomy, where countless scholars could pour over Ptolemy’s data and provide their own analysis, interpertation, and critiques, with the resultant dialogue developing the science.

In the above example of flu outbreak prediction there is relatively little interest to game the system or profiteer. If you really want to see the authority of science or math abused to misguide and make profit then finance is ready to please. I’ve already highlighted the danger of hiding lies in complex derivatives, and I’m not alone, there is even a forthcoming book on Weapons of Math Destruction. One of the simplest such weapons is the pseudo-mathematical misuse of backtesting among financial advisors (Bailey et al., 2014). Here, the advisor presents some mathematical model, sometimes simple and sometimes complex, for predicting when its best to buy or sell a given asset (or some other investment decision). He assures you of its soundness by showing you the great Sharpe ratio it has on historic stock data and a statistical test guaranteeing its significance. What he neglects to tell you as the thousands (or more with modern computers) of other candidate models he considered, and the lack of any out-of-sample testing. The result is a model that overfits its training set and immediately fails on new data. However, even such predictions consistently and reliably fail, the community of advisors is able to explain away the issue with more appeals to science like the efficient markets hypothesis: “the market has found the hidden effect and arbitraged away the profits”. In reality, the original model hadn’t detected any actual regularity, but the advisor was able to hide this behind a veil of scientism.

I am not trying to argue that we should avoid increasing our understanding of social systems, or even that we should avoid using our understanding to guide policy. What I am trying to convince you of is that the only way to do this is through critical and well-intentioned discourse. Belittling the social sciences, or trying to argue from the authority of past success in the ‘hard’ sciences, is not conducive of such a discussion. Instead, we should embrace a plurality of methods and be modest of our past accomplishments and mindful that just because our tools worked in one domain, doesn’t mean that they are likely to work in another. At least not without significant work and give-and-take. Further, this discussion cannot be confined to a community of experts in some methodological ghetto, but should do its best to embrace as many voices as possible, hopefully including those of the public that police is trying to affect.

References

Bailey, D. H., Borwein, J. M., de Prado, M. L., & Zhu, Q. (2014). Pseudo mathematics and financial charlatanism: the effects of backtest overfitting on out-of-sample performance. Notices of the AMS, 61(5): 458-471.

Freeman, L. C. (2004). The development of social network analysis: A study in the sociology of science. Vancouver: Empirical Press.

Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457(7232): 1012-1014.

Kahan, D. M., Peters, E., Dawson, E. C., & Slovic, P. (2013). Motivated Numeracy and Enlightened Self-Government. Yale Law School, The Cultural Cognition Project, Working Paper, (116).

Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). Big data. The parable of Google Flu: traps in big data analysis. Science, 343 (6176), 1203-1205 PMID: 24626916

About these ads

About Artem Kaznatcheev
From the ivory tower of the School of Computer Science and Department of Psychology at McGill University, I marvel at the world through algorithmic lenses. My specific interests are in quantum computing, evolutionary game theory, modern evolutionary synthesis, and theoretical cognitive science. Previously I was at the Institute for Quantum Computing and Department of Combinatorics & Optimization at the University of Waterloo and a visitor to the Centre for Quantum Technologies at the National University of Singapore.

15 Responses to Big data, prediction, and scientism in the social sciences

  1. Adam Benton says:

    I think Hossenfelder has an excellent point. With the advent of computers we’ve been able to gather and analyse way more data than before, yet in many institutions the “social sciences” haven’t been taught these new techniques. Here at UoL, for example, there were no courses on statistics or genetics taught by the archaeology, classics & Egyptology department (under which I fell) and there were precious few opportunities (count, 1 basic class) to learn them in other departments.

    In short, advances have presented us with a sea of data, but inadequate skills are being taught. I suspect that gulf in the small worlds work can be blamed in large part on that. With more traditional social scientists simply not being equipped by the tools to understand the research by the physicists. We’re looking at entire generations of anthropologists/archaeologists here with only a rudimentary understanding of genetics. How will they be able to effectively engage with the literature now so important to their field?

    Don’t get me wrong, I think that there can be a fair bit of “methodological intimidation” and hubris from hard scientists coming along with their fancy new techniques. They certainly need to climb down out of their ivory towers and make their processes more understandable to the entire field. But I think the solution is a two way street. Your regular social scientists need to be willing to climb up a bit and learn and use some the new techniques being brought to the table.

    This is one of the critical issues I see with the whole scientism issue. Sure, physicists can be a bit arrogant over their methods; but once the cry of “scientism” has been given the more traditional researchers will turtle up, ignoring any and all advances brought by those dastardly outsiders.

    Hard scientists need to be less condescending, soft scientists need to be more willing to change.

    • I agree with you that everybody should learn more theory, but that is not limited to social scientists. I think you would be surprised to how poor the training of many experimental biologists is in statistics and non-existent in mathematics. However, that doesn’t stop them from doing a good job.

      My point wasn’t that social scientists shouldn’t learn methods from physics or other fields. My point was that the transmission of these tools has to be through a dialogue between the disciplines not through imposition from the outside. Of course, once these tools start to be assimilated, it will become important to introduce them into the undergraduate curriculum.

      • Adam Benton says:

        Don’t get me wrong, I agree with pretty much everything you’re saying. It should be a dialogue and people without hardcore training in the hardcore subjects can do excellent work.

        It’s just I’ve repeatedly seen the notion of “scientism” been used as an excuse to halt the dialogue. Once that criticism has been levelled on an idea it is dismissed. Again, I agree on most of your points. The “hard” scientists are not without fault, but the more traditional researchers will often get very reactive and defensive over the issue, and that can be just as counter-productive as any arrogant physicist waving around their statistics like they’re the second coming.

        I think we’re violently agreeing here. There should be a dialogue, and you’re pointing out what the other fields can do better. I’m just saying the traditional workers can do things better too; and labelling something scientism (and the connotations that come with it) often isn’t one of them.

        • Adam Elkus says:

          I agree with the *other* Adam on this point. That being said, I don’t think framing this as hard scientists vs. social scientists quite captures the variation in prejudice.

          Social scientific fields, like Benedict Anderson’s theory of nations, are imagined communities that use narratives to paper over the fact that many of them are synthetic composites of a series of perhaps disjunctive components. Some social science fields are happy to import and borrow from different fields that they feel are more “scientific” and “hard” than they are. But the problem is that their choices about which fields to take from and which fields to ignore are often much too restrictive, and often colored by prejudice, misunderstanding, and/or simple lack of time, interest, and incentive.

  2. Shecky R says:

    The point about the social, and even biological, sciences is that they deal with humans or other living things, having far more variables and complexity. This means they are actually harder sciences… easy to do sloppily, but exceedingly difficult to do well because of their inherent imprecision relative to sciences based on reductionist principles operating on non-living subjects. Not all sciences are equal, and frankly I believe ‘hard’ scientists generally have a greater appreciation for uncertainty in science, while soft scientists too often display a bloated/unwarranted sense of certainty in their fields, which needs to be called out. Shining a light on the weaknesses/limitations of the soft sciences isn’t so much condescending, as it is simply part of routine critical inquiry.

    • The point about the social, and even biological, sciences is that they deal with humans or other living things, having far more variables and complexity. This means they are actually harder sciences…

      Although to some extent I agree with this, I also think it is important to reflect on because I don’t think it is necessarily a property of the systems under study as opposed to the history of how we happened to study them. I feel like some of this comes from the freedom that physics had to shape the sort of questions they ask and what their ontology looks like because they were able to isolate themselves more and more from folk physics. For instance, when Hossenfelder writes at the end of her post:

      The social sciences will never be as “hard” as the natural sciences because there is much more variation among people than among particles and among cities than among molecules.

      It is important to note that particles and molecules are theoretical concepts that physics has built over its long history. These concepts have become so good at explaining the world around us that we have elevated them to the ontological status of reality. Now, before you cry “foul! We didn’t elevate them to reality, they always were reality” then I would like to remind you of the quantum mechanics and the understanding it brings that particle is a convenient description (that breaks down at times) for a much more less intuitive ‘reality’. Of course, as we have more paradigm shifts and our ontological ground-set continues to change, the specifics of ‘the most fundamental reality’ will also change with it.

      Now, lets instead look at the terms ‘people’ and ‘cities’. Where did we get those from? The first comes from our most innate theory-of-mind and the second comes from various political decisions. Neither of these terms shows the rich development that one would expect from years of scientific discourse and redefinition, instead of imposing from the outside.

      Consider if we had stuck to atoms as our basic describers in physics/chemistry, we would have 100s of different types with various isotarities. Thankfully, after years of investigation, we were able to identify better terms in protons, neutrons, and electrons that more beautifully and succinctly summarized our understanding. As we probed those more, we decided that they were not reality either, and instead saw that for certain discussions the concepts of quarks was enriching.

      What if the same holds for say humans? Maybe the reason they are so ‘hard’ to describe is because we are taking them as primitives? Of course, if I had any idea what was a better primitive than I would be a great scientist, so I can’t propose a better one. However, I also can’t rule out that these terms need more development.

      We’ve definitely seen some of this development in biology, and gained some clarity as we went from individual animals to genes as our central units (of course, the cost of this clarity is still being disputed to some extent).

      To tie these strands together, I would like to then appeal to something like the social brain hypothesis, and suggest that one of the reasons for why we were able to develop these terms more in physics than in the human sciences is because our folk theory of physics is less important to us than our folk theory of social interactions and other humans. As such, it was easier for us to question and erode that folk theory than it was to erode our social folk theories.

      Not all sciences are equal, and frankly I believe ‘hard’ scientists generally have a greater appreciation for uncertainty in science, while soft scientists too often display a bloated/unwarranted sense of certainty in their fields, which needs to be called out.

      Again, this is true to some extent, but not universal. For instance, in light of my previous rant, I find that physical scientists (especially ones that haven’t looked too much at the history or philosophy of their field) are much more married and certain in their underlying ontology than social scientists are. If the case is really as I describe (and I don’t necessarily believe it is, I just wanted to offer an alternative perspective) then question your ontology is more important and thus social scientists might be better placed. In reality, I can’t say too much more about this, because I work with very few social scientists and because it is hard for me to overcome my own pro-hard-sciences bias.

      Shining a light on the weaknesses/limitations of the soft sciences isn’t so much condescending, as it is simply part of routine critical inquiry.

      Yet again, partial agreement for me. It is not the shining of the light on the weaknesses/limitations that is condescending. In fact, exploring limitations and assumptions is fundamental for critical inquiry. It is how those weaknesses/limitations are presented. If one simply straw-mans the theory, or presents weaknesses they assume social scientists have without an engagement with their literature then it is not all that conducive to inquiry. For example, a lot of Hossenfelder’s case seems to be centered about her prefered views on free-will, as if this is not a discussion with a rich history in the social sciences. Of course, the whole history of engagement might be misguided, but she won’t know that unless she goes and reads some of it.

      • Adam Elkus says:

        “Now, lets instead look at the terms ‘people’ and ‘cities’. Where did we get those from? The first comes from our most innate theory-of-mind and the second comes from various political decisions. Neither of these terms shows the rich development that one would expect from years of scientific discourse and redefinition, instead of imposing from the outside.

        Consider if we had stuck to atoms as our basic describers in physics/chemistry, we would have 100s of different types with various isotarities. Thankfully, after years of investigation, we were able to identify better terms in protons, neutrons, and electrons that more beautifully and succinctly summarized our understanding. As we probed those more, we decided that they were not reality either, and instead saw that for certain discussions the concepts of quarks was enriching”

        If we’re to be generous, the social sciences have been around since the mid-19th century. If we’re to be conservative, they date really to World War II and the expansion of modern university research system. In either case, they are both young sciences. Why didn’t social sciences take hold earlier? It’s an interesting question, and in large part the historical answer comes down to the idea that social sciences are the handmaidens of the modern state and its ordering processes. The role that the various quantitative and theoretical apparatuses of the social sciences played in both enhancing state capability (statistical bureaus, censuses, modernization theory, operations research, etc) for both internal and external power projection is well-known.

        Lesser known is the popular interest in social science that came out as a way of explaining disruptive 19th century and 20th century economic, technological, and sociopolitical changes, encounters with Others, etc. Charles Tilly’s “Big Structures, Large Processes, Huge Comparisons” goes into how, in many ways, folk ideas of society and politics have not really changed much since the earlier 19th century days of Durkheim, Weber, Marx, etc. That’s both a function of academic habit as well as the enduring popular appeal of the frames that 19th century social science introduced.

  3. This is the best argument for interdisciplinarity! Usually one gets some points in grant applications for this criterion, but most of the time is just familiar content wrapped in fancy language. My understanding of what you write is that interdisciplinarity is actually required for making good use of big data. True?

    • Thank you! I am not sure if I had that idea at the outset of my writing, I just wrote to clear my mind of thoughts that were rattling around. Often others see more clarity in my writing than I do! I think I would agree with your assessment that interdisciplinarity is required for good use of big data, especially if your use of data is meant to affect society. At the very least, one should be mindful and respectful of other fields to the best of their ability. Give them an honest chance.

      • With pleasure! I would be interested to see if this argument could be developed more. To have access to big data is awesome, but it kind of contradicts with some ingredients of the cartesian method (which is the basis of the scientific method, pre big data age), namely: “The third, to conduct my thoughts in such order that, by commencing with objects the simplest and easiest to know, I might ascend by little and little, and, as it were, step by step, to the knowledge of the more complex; assigning in thought a certain order even to those objects which in their own nature do not stand in a relation of antecedence and sequence.”
        If I may, I discussed about this here, but it would be much nicer to see if this clash between the cartesian method and big data is something which already has, or in the near future will have effects in the real world.

  4. You wrote: using the undue authority of science to drive social change
    Did you mean: using the authority of science to unduly drive social change

  5. Pingback: Models, modesty, and moral methodology | Theory, Evolution, and Games Group

  6. Pingback: garrodius

  7. Pingback: A Theorist’s Apology | Theory, Evolution, and Games Group

  8. Pingback: Weapons of math destruction and the ethics of Big Data | Theory, Evolution, and Games Group

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,318 other followers