Weapons of math destruction and the ethics of Big Data
September 5, 2014 14 Comments
I don’t know about you, dear reader, but during my formal education I was never taught ethics or social consciousness. I even remember sitting around with my engineering friends that had to take a class in ethics and laughing at the irrelevance and futility of it. To this day, I have a strained relationship with ethics as a branch of philosophy. However, despite this villainous background, I ended up spending a lot of time thinking about cooperation, empathy, and social justice. With time and experience, I started to climb out of the Dunning-Kruger hole and realize how little I understood about being a useful member of society.
One of the important lessons I’ve learnt is that models and algorithms are not neutral, and come with important ethical considerations that we as computer scientists, physics, and mathematicians are often ill-equipped to see. For exploring the consequences of this in the context of the ever-present ‘big data’, Cathy O’Neil’s blog and alter ego mathbabe has been extremely important. This morning I had the opportunity to meet Cathy for coffee near her secret lair on the edge of Lower Manhattan. From this writing lair, she is working on her new book Weapons of Math Destruction and “arguing that mathematical modeling has become a pervasive and destructive force in society—in finance, education, medicine, politics, and the workplace—and showing how current models exacerbate inequality and endanger democracy and how we might rein them in”.
I can’t wait to read it!
In case you are impatient like me, I wanted to use this post to share a selection of Cathy’s articles along with my brief summaries for your browsing enjoyment:
Let them game the model (February 3, 2012)
The fear of ‘gaming the model’ is a frequent source of opposition to transparency; this argument is vapid. If your model can be gamed then it isn’t measuring what you think it is measuring but a poor proxy, and so your model needs improvement. If your model was actually measuring what it asserts to measure, then gaming the model would be equivalent to getting better at the thing the model measures; that is something we desire. Transparency can only improve models. For a concrete example consider text laundering.
The complexity feedback loop of modeling (January 8, 2013)
As a model becomes more popular and standardized, the little marginal effects that were abstracted away during its initial construction become important. People then have an incentive to focus on those details and complexify the model to account for them. Sometimes this can be good, but in the prototypical example of finance it is bad. To help manage risk, bankers introduced derivatives but then there was the new and more difficult task of figuring out how to judge the risk of these new derivatives.
This was one of the inspirations for my post on Mathematics in finance and hiding lies in complexity.
Modeling in plain English (March 17, 2013)
The ability to understand or explain a model, the interpretability, is often at odds with accuracy and we have to balance the two aspects carefully. For tools that directly affect peoples lives, like their access to credit, it is important to model the individuals in such a way that the companies decision and the model it is based on can be explained to the person. In the case of the credit card companies this interpretability is demanded by law.
I explore similar themes from the philosophy of science in Machine learning and prediction without understanding.
Big data and surveillance (April 24, 2013)
Technocrats and big data (May 31, 2013)
The congressional subcommittee on big data was underwhelming, consisting mostly of big-data chest-thumping, and only one congressman bringing up the importance of skepticism. There is a risk big data proponents becoming the new technocrats that promote their ‘good’ policies with the “side effect of either reinforcing already dominant groups or weakening already frail ones”.
I discuss similar concerns in Models, modesty, and moral methodology.
The politics of data mining (June 22, 2013)
Data miners have a lot of professional mobility, but they still divide into two political camps: “people who want to cause things to happen and people who want to prevent things from happening”. The former are innovating or ‘innovating’ in start-ups, and the latter are covering asses in big corporations and government.
Should lawmakers use algorithms? (August 5, 2013)
It is tempting to introduce algorithms into law, and to thus improve law from the continual improvement of algorithms in the same way that Google improves search. However, this is misguided because laws should not be black-boxes, especially when the powerful are the ones that get to write them and thus further increasing information asymmetry. Since programming and mathematics themselves are not accessible to the average person, simply making the code available does nothing to prevent a black box. One could consider using algorithms in the enforcement instead of the definition of the law, and this can be helpful as long as the enforcement agency is mindful of the biases of their approach and doesn’t implement something racist like Stop-and-Frisk.
Working in the NYC Mayor’s office (September 10, 2013)
When working with personal data, especially the data of the most vulnerable populations, it is important to put the person, their agency, and privacy ahead of efficiency. This is rarely done in finance, where the data feels like the anonymized output of the market machine, or advertising, where the person is merely prey.
How to lie with statistics (February 3, 2014)
Comparison of the forthcoming Weapons of Math Destruction to Darrell Huff (1954) How to Lie with Statistics (online pdf): which lessons translate and which don’t? Some of the failures of translation are interesting in their own right, such as how “data is so big as to be inaccessible” and so we can’t be mislead by charts because they are no longer presented. The most transferrable tips are selection bias, survivorship bias, and semi-attached figure — confusing people about topic A by discussing a not directly relevant but related topic B; something that is even more common when data is too big to look at.
No, Sandy Pentland, let’s not optimize the status quo (May 2, 2014)
A negative commentary Alex Pentland’s Social Physics, largely in agreement with Nicholas Carr’s review of the book: “It will encourage us to optimize the status quo rather than challenge it.” A prime example of this is sexism in orchestras; there performing from behind a sheet greatly increased the chance of women succeeding in their audition. This sort of transformative change is possible only if we recognize that social categories are not just theoretical constructs but are embedded in the minds of decision makers, which is impossible if we ignore them as Petland suggests.
This is close to some of my thoughts in Big data, prediction, and scientism in the social sciences.
Podesta’s Big Data report to Obama: good but not great (May 5, 2014)
Podesta’s recommendation to “Expand Technical Expertise to Stop Discrimination” is an important acknowledgement of the potential pitfalls of Big Data, but does not go far enough. Expertise is not sufficient, since it is too easy to hide discrimination in proprietary or not-apparent models. After accumulation, the expertise should be legislated the power to investigate these secret models.
Podesta’s report to Obama on Big Data has done well to recognize the implications of Big Data for civil rights, in particular by focusing on Civil Rights Principles for the Era of Big Data: (1) stop high-tech profiling, (2) ensure fairness in automated decisions, (3) preserve constitutional principles, (4) enhance individual control of personal information, and (5) protect people from inaccurate data. However, they could have made a better point of finance as a case-study of mistakes made with Big Data, instead of just highlighting their use of data to detect fraud.
Ignore data, focus on power (May 20, 2014)
The problem with data, open or otherwise, is that it is used to wield power. When data is made available to the public, it is seldom data about the rich or powerful, but almost always about the unaware or powerless. “The power is where the data isn’t” and we should be mindful of that.
The business of big data audits: monetizing fairness (May 23, 2014)
Given the recent bad press on abuses of big data, it seems natural to aim to create a business of auditing big data companies. Done as black-box testing, this might be feasible and not as bad as it seems at first sight as long as the auditing is done transparently and not through some proprietary audit process that we can’t trust.
The dark matter of big data (June 25, 2014)
A lot of for-profit predictive modeling is hidden from us, and we don’t even know that we are being modeled. These tools often try to live at the edge of the law, if you see “regulatory gray area”; it should make you think “big data dark matter lives here”.” Such applications can be particularly dangerous when they are used to limit our access to employment, credit, or health insurance. The latter issue is only becoming more dangerous with the increased popularity of the quantified self movement.
What constitutes evidence? (July 7, 2014)
What should we count as evidence when the system is deliberately set up to make it nearly impossible to collect? This matters for when we look at the effects of private data, such as medical data, being used by proprietary algorithms to say bias hiring decisions.
Gilian Tett gets it very wrong on racial profiling (August 25, 2014)
Gilian Tett, although able to predict in 2006 the looming financial crisis, can still be fooled by the illusory neutrality of algorithms. But algorithms are just a tool and no tool is truly neutral. “No algorithm focused on human behavior is neutral. Anything which is trained on historical human behavior embeds and codifies historical and cultural practices.”
Of course, this is only a tiny fraction of the great posts by mathbate; she is very prolific. Did I miss a particularly great post? My summaries are a rough guide and don’t do justice to the entire content of her posts, I recommend subscribing to her blog and exploring her archives for yourself.
To the regular readers: let me know if you enjoy this sort of compilation of links and I will do similar posts in the future for other blogs I follow.