Weapons of math destruction and the ethics of Big Data

CathyONeilI don’t know about you, dear reader, but during my formal education I was never taught ethics or social consciousness. I even remember sitting around with my engineering friends that had to take a class in ethics and laughing at the irrelevance and futility of it. To this day, I have a strained relationship with ethics as a branch of philosophy. However, despite this villainous background, I ended up spending a lot of time thinking about cooperation, empathy, and social justice. With time and experience, I started to climb out of the Dunning-Kruger hole and realize how little I understood about being a useful member of society.

One of the important lessons I’ve learnt is that models and algorithms are not neutral, and come with important ethical considerations that we as computer scientists, physics, and mathematicians are often ill-equipped to see. For exploring the consequences of this in the context of the ever-present ‘big data’, Cathy O’Neil’s blog and alter ego mathbabe has been extremely important. This morning I had the opportunity to meet Cathy for coffee near her secret lair on the edge of Lower Manhattan. From this writing lair, she is working on her new book Weapons of Math Destruction and “arguing that mathematical modeling has become a pervasive and destructive force in society—in finance, education, medicine, politics, and the workplace—and showing how current models exacerbate inequality and endanger democracy and how we might rein them in”.

I can’t wait to read it!

In case you are impatient like me, I wanted to use this post to share a selection of Cathy’s articles along with my brief summaries for your browsing enjoyment:

Let them game the model (February 3, 2012)

The fear of ‘gaming the model’ is a frequent source of opposition to transparency; this argument is vapid. If your model can be gamed then it isn’t measuring what you think it is measuring but a poor proxy, and so your model needs improvement. If your model was actually measuring what it asserts to measure, then gaming the model would be equivalent to getting better at the thing the model measures; that is something we desire. Transparency can only improve models. For a concrete example consider text laundering.

The complexity feedback loop of modeling (January 8, 2013)

As a model becomes more popular and standardized, the little marginal effects that were abstracted away during its initial construction become important. People then have an incentive to focus on those details and complexify the model to account for them. Sometimes this can be good, but in the prototypical example of finance it is bad. To help manage risk, bankers introduced derivatives but then there was the new and more difficult task of figuring out how to judge the risk of these new derivatives.

This was one of the inspirations for my post on Mathematics in finance and hiding lies in complexity.

Modeling in plain English (March 17, 2013)

The ability to understand or explain a model, the interpretability, is often at odds with accuracy and we have to balance the two aspects carefully. For tools that directly affect peoples lives, like their access to credit, it is important to model the individuals in such a way that the companies decision and the model it is based on can be explained to the person. In the case of the credit card companies this interpretability is demanded by law.

I explore similar themes from the philosophy of science in Machine learning and prediction without understanding.

Big data and surveillance (April 24, 2013)

Is “almost all of big data … devoted to surveillance”? In the past, Todd Papaioannou’s encouraged consumers to think of the ‘never throw data away’ approach not as a corporate Big Brother but as a service of personalized consumer experience. Today, Raytheon’s RIOT software for tracking the location and predicting the behavior of individuals. Tomorrow, our children will be “the most stalked generation” given the abuses of educational privacy laws by companies like inBloom. So it seems that data as surveillance is not a myth, but we can try to address it by realizing that (1) anonymization doesn’t work with large databases, (2) there is no longer a clear line between sensitive and nonsensitive data, (3) the most vulnerable people don’t understand the threat, (4) Europe is well ahead of the US in privacy policy, and (5) not just collection but also usage of data should be limited.

Technocrats and big data (May 31, 2013)

The congressional subcommittee on big data was underwhelming, consisting mostly of big-data chest-thumping, and only one congressman bringing up the importance of skepticism. There is a risk big data proponents becoming the new technocrats that promote their ‘good’ policies with the “side effect of either reinforcing already dominant groups or weakening already frail ones”.

I discuss similar concerns in Models, modesty, and moral methodology.

The politics of data mining (June 22, 2013)

Data miners have a lot of professional mobility, but they still divide into two political camps: “people who want to cause things to happen and people who want to prevent things from happening”. The former are innovating or ‘innovating’ in start-ups, and the latter are covering asses in big corporations and government.

Should lawmakers use algorithms? (August 5, 2013)

It is tempting to introduce algorithms into law, and to thus improve law from the continual improvement of algorithms in the same way that Google improves search. However, this is misguided because laws should not be black-boxes, especially when the powerful are the ones that get to write them and thus further increasing information asymmetry. Since programming and mathematics themselves are not accessible to the average person, simply making the code available does nothing to prevent a black box. One could consider using algorithms in the enforcement instead of the definition of the law, and this can be helpful as long as the enforcement agency is mindful of the biases of their approach and doesn’t implement something racist like Stop-and-Frisk.

Working in the NYC Mayor’s office (September 10, 2013)

When working with personal data, especially the data of the most vulnerable populations, it is important to put the person, their agency, and privacy ahead of efficiency. This is rarely done in finance, where the data feels like the anonymized output of the market machine, or advertising, where the person is merely prey.

How to lie with statistics (February 3, 2014)

Comparison of the forthcoming Weapons of Math Destruction to Darrell Huff (1954) How to Lie with Statistics (online pdf): which lessons translate and which don’t? Some of the failures of translation are interesting in their own right, such as how “data is so big as to be inaccessible” and so we can’t be mislead by charts because they are no longer presented. The most transferrable tips are selection bias, survivorship bias, and semi-attached figure — confusing people about topic A by discussing a not directly relevant but related topic B; something that is even more common when data is too big to look at.

No, Sandy Pentland, let’s not optimize the status quo (May 2, 2014)

A negative commentary Alex Pentland’s Social Physics, largely in agreement with Nicholas Carr’s review of the book: “It will encourage us to optimize the status quo rather than challenge it.” A prime example of this is sexism in orchestras; there performing from behind a sheet greatly increased the chance of women succeeding in their audition. This sort of transformative change is possible only if we recognize that social categories are not just theoretical constructs but are embedded in the minds of decision makers, which is impossible if we ignore them as Petland suggests.

This is close to some of my thoughts in Big data, prediction, and scientism in the social sciences.

Podesta’s Big Data report to Obama: good but not great (May 5, 2014)

Podesta’s recommendation to “Expand Technical Expertise to Stop Discrimination” is an important acknowledgement of the potential pitfalls of Big Data, but does not go far enough. Expertise is not sufficient, since it is too easy to hide discrimination in proprietary or not-apparent models. After accumulation, the expertise should be legislated the power to investigate these secret models.

Inside the Podesta Report: Civil Rights Principles of Big Data (May 7, 2014)

Podesta’s report to Obama on Big Data has done well to recognize the implications of Big Data for civil rights, in particular by focusing on Civil Rights Principles for the Era of Big Data: (1) stop high-tech profiling, (2) ensure fairness in automated decisions, (3) preserve constitutional principles, (4) enhance individual control of personal information, and (5) protect people from inaccurate data. However, they could have made a better point of finance as a case-study of mistakes made with Big Data, instead of just highlighting their use of data to detect fraud.

Ignore data, focus on power (May 20, 2014)

The problem with data, open or otherwise, is that it is used to wield power. When data is made available to the public, it is seldom data about the rich or powerful, but almost always about the unaware or powerless. “The power is where the data isn’t” and we should be mindful of that.

The business of big data audits: monetizing fairness (May 23, 2014)

Given the recent bad press on abuses of big data, it seems natural to aim to create a business of auditing big data companies. Done as black-box testing, this might be feasible and not as bad as it seems at first sight as long as the auditing is done transparently and not through some proprietary audit process that we can’t trust.

The dark matter of big data (June 25, 2014)

A lot of for-profit predictive modeling is hidden from us, and we don’t even know that we are being modeled. These tools often try to live at the edge of the law, if you see “regulatory gray area”; it should make you think “big data dark matter lives here”.” Such applications can be particularly dangerous when they are used to limit our access to employment, credit, or health insurance. The latter issue is only becoming more dangerous with the increased popularity of the quantified self movement.

What constitutes evidence? (July 7, 2014)

What should we count as evidence when the system is deliberately set up to make it nearly impossible to collect? This matters for when we look at the effects of private data, such as medical data, being used by proprietary algorithms to say bias hiring decisions.

Gilian Tett gets it very wrong on racial profiling (August 25, 2014)

Gilian Tett, although able to predict in 2006 the looming financial crisis, can still be fooled by the illusory neutrality of algorithms. But algorithms are just a tool and no tool is truly neutral. “No algorithm focused on human behavior is neutral. Anything which is trained on historical human behavior embeds and codifies historical and cultural practices.”

Of course, this is only a tiny fraction of the great posts by mathbate; she is very prolific. Did I miss a particularly great post? My summaries are a rough guide and don’t do justice to the entire content of her posts, I recommend subscribing to her blog and exploring her archives for yourself.

To the regular readers: let me know if you enjoy this sort of compilation of links and I will do similar posts in the future for other blogs I follow.


About Artem Kaznatcheev
From the Department of Computer Science at Oxford University and Department of Translational Hematology & Oncology Research at Cleveland Clinic, I marvel at the world through algorithmic lenses. My mind is drawn to evolutionary dynamics, theoretical computer science, mathematical oncology, computational learning theory, and philosophy of science. Previously I was at the Department of Integrated Mathematical Oncology at Moffitt Cancer Center, and the School of Computer Science and Department of Psychology at McGill University. In a past life, I worried about quantum queries at the Institute for Quantum Computing and Department of Combinatorics & Optimization at University of Waterloo and as a visitor to the Centre for Quantum Technologies at National University of Singapore. Meander with me on Google+ and Twitter.

14 Responses to Weapons of math destruction and the ethics of Big Data

  1. vznvzn says:

    yep mathbabe, a great/ distinct/ unique/ idiosyncratic voice in the blogosphere, a prolific writer, brilliant. she has huge insight into the 2008 crash and was working in a quant/ high freq trading firm at the time. great/ cogent insights into inequality, big data, wealth inequality, modelling, statistics, etc. more on big data incl occasional citing her. wait, you say you met her in person? no details on that? maybe that deserves a whole other blog? :)

  2. Pingback: Weapons of math destruction and the ethics of B...

  3. stephenk1 says:

    I do enjoy having my world view challenged and, man, this article challenges me to think very hard on my system of values, but how do I distinguish this line of thinking from a sophisticated form of Luddism?

    • I am glad to hear that you are so open minded. Why do you think that this is a sophisticated form of Luddism? Neither Cathy nor I are advocating that we abandon technology. In fact, she is a data scientists, and I am (sort of) a computer scientists (although the whole story is a bit more complicated); abandoning technology would be against our self-interest.

      My stance is that we just have to be more mindful of how technology embodies our ethical stances towards social justice. In particular, it is important in the case of social technologies to focus on them not as a means to an end (because we often don't fully understand the damage we do along the way) but analyze them as an end in themselves (is the mechanism fair and reasonable even if it doesn't achieve its aim? Even if someone tries to game it?). The post is also meant as a call-out to programmers and other people who specialize in technical fields to remind them that the social ramifications of their work might be more nuanced than they seem at first glance. This is especially important given that a typical technologist does not have much training in the social sciences, or a lot of first-hand experience with say being discriminated against.

  4. Pingback: Weapons of math destruction and the ethics of Big Data | Theory, Evolution, and Games Group | geistwc

  5. derek says:

    thank you for this!:) china (or at least the chinese government) is talking up big data a lot this year. this list of yours is gold (for me) coz i am writing up an article on this chinese focus right now. personally i have problems with big data because i am a philosopher, and “we” do not like numbers (sorry). my understanding of math stops at “buying 5 red apples at the market” (as in math and numbers make no sense except to the human observers therein). without at least a philosophical if not phenomenological examination of big data ideas, i just do not feel comfortable. maybe i just know too many internet startup type kids here in beijing, strange bunch people, smart, yet never seem to be able to wise up regarding the big questions of life (why, what and who we are that kind of things) and they replicate each other at an alarming rate while they themselves segregate from the “masses” i.e. the peasant workers. and big data imho somehow smells eugenics a bit, just a bit, though a bit is enough to make me twitch

  6. Pingback: Cataloging a year of blogging: the philosophical turn | Theory, Evolution, and Games Group

  7. Pingback: A year in books | Theory, Evolution, and Games Group

  8. Abel Molina says:

    Makes sense to limit abuse of power though data, though more data in the wrong hands seems to be in most cases a superficial aspect of the situation, not the root cause. In particular, the whole credit system as an important part of life already seems vastly sketchy, and personal data getting to the relevant companies without a clear authorization is just the cherry on the cake. Would recommend just not playing that game as much as possible and assuming the consequences [e.g. credit reports not being able to be produced], but cultures don’t change that much, specially because of random comments in the Internet.

    For cultural contrast, by the way, in Spain there was a compulsory ethics class in the third-last year of high school (wouldn’t say most people got anything out of it though), and also had to take it for a few years in primary school as the alternative to Catholic religion class. There is also a top-level law related to data collection that is pretty strict – don’t know how well that is faring in the realm of multinational entities, but it seems from Wikipedia like Groupon got a 20,000 euros fine for storing CVV numbers (http://es.wikipedia.org/wiki/Ley_Org%C3%A1nica_de_Protecci%C3%B3n_de_Datos_de_Car%C3%A1cter_Personal_de_Espa%C3%B1a). On the other hand, apparently invites to app/websites are disallowed too, and pretty sure it’s happening. There is also of course the whole thing where the police spies more on people they said they would.

  9. Pingback: False memories and journalism | Theory, Evolution, and Games Group

  10. Pingback: An update | Theory, Evolution, and Games Group

  11. Pingback: Systemic change, effective altruism and philanthropy | Theory, Evolution, and Games Group

  12. Pingback: Computational kindness and the revelation principle | Theory, Evolution, and Games Group

  13. Pingback: Social algorithms and the Weapons of Math Destructions | Theory, Evolution, and Games Group

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s