The math that determines your whole life is racist & predatory, says Cathy O'Neil
O’Neil sees plenty of parallels between the usage of Big Data today and the predatory lending practices of the subprime crisis. In both cases, the effects are hard to track, even for insiders. Like the dark financial arts employed in the run up to the 2008 financial crisis, the Big Data algorithms that sort us into piles of “worthy” and “unworthy” are mostly opaque and unregulated, not to mention generated (and used) by large multinational firms with huge lobbying power to keep it that way. “The discriminatory and even predatory way in which algorithms are being used in everything from our school system to the criminal justice system is really a silent financial crisis,” says O’Neil.
The effects are just as pernicious. Using her deep technical understanding of modeling, she shows how the algorithms used to, say, rank teacher performance are based on exactly the sort of shallow and volatile type of data sets that informed those faulty mortgage models in the run up to 2008. Her work makes particularly disturbing points about how being on the wrong side of an algorithmic decision can snowball in incredibly destructive ways—a young black man, for example, who lives in an area targeted by crime fighting algorithms that add more police to his neighborhood because of higher violent crime rates will necessarily be more likely to be targeted for any petty violation, which adds to a digital profile that could subsequently limit his credit, his job prospects, and so on. Yet neighborhoods more likely to commit white collar crime aren’t targeted in this way.
In higher education, the use of algorithmic models that rank colleges has led to an educational arms race where schools offer more and more merit rather than need based aid to students who’ll make their numbers (thus rankings) look better. At the same time, for-profit universities can troll for data on economically or socially vulnerable would be students and find their “pain points,” as a recruiting manual for one for-profit university, Vatterott, describes it, in any number of online questionnaires or surveys they may have unwittingly filled out. The schools can then use this info to funnel ads to welfare mothers, recently divorced and out of work people, those who’ve been incarcerated or even those who’ve suffered injury or a death in the family.
The usage of Big Data to inform all aspects of our lives, with and without our knowledge, matters not just because it dictates the life chances that are presented or denied to us. It also matters because the artificial intelligence systems that are being developed and deployed are learning from the data is collected. And those AI systems, themselves, can be biased and inaccessible to third-party audit.
Corporations are increasingly the substitutes for core state institutions. And as they collect and analyze data in bulk and hide away their methods of presenting data on behalf of states (or in lieu of past state institutions) the public is left vulnerable not just to corporate malice, but disinterest. Worse, this is a kind of disinterest that is difficult to challenge in the absence of laws compelling corporate transparency.
Chris (who I’m reblogging this from) is one of the smartest people I know, and since grad school he’s been the #1 person I’d listen to when it comes to technology, privacy, and the like. This is really, really crucial stuff.
Data is a weapon that has it’s own biases. It’s a weapon that can be used for both good and evil, but more often looks exactly like the people wielding it.
As someone who builds the processes surrounding big data (along with the data generation), accuracy isn’t the most important part. Who you are is represented by numbers, and updating the numbers on the small scale, individually, don’t affect the whole. Your numbers, what ‘your value is’ to individual corporations isn’t meaningful, even if it’s wrong. What the system thinks of you is what you are. Because, to marketing, internal organizations, you’re dollar signs, or you’re a target audience that isn’t yet tapped.
Just remember what Target is capable of when it has access to your shopping history? A lot. While in many ways it’s good. Because it reduces waste, it eliminates inefficiency, it also reduces you to a set of decisions that you make, without knowing scenario data. As @quirksintech points out, you don’t have access to this data. There’s a lot of data that Amazon, Google, Apple, Target, Wal-Mart keeps on people that you’ll just never have access to, that you can’t audit. It’s not like disputing an entry on your credit score, which would have been the most likely analogy ten years ago.
This is a larger, more damning issue when you apply this to quantitative ranking of people. When you eliminate relationships, and locality, and again, look at people purely as numbers. As a software engineer, I’ll readily admit that there’s a lot of scenarios in Big Data where we can’t properly engineer solutions to deal with context in a lot of situations, and thus, we just ‘guess’, and create coefficients that hope to reduce some variance in data in order to make decisions on it that exist within an acceptable margin of error here.
But, if you’re stuck in the margin of error, you’re in the dark, you don’t know why you are, and often you have no recourse when you are, and it’s not a good investment of time for corporations to change it, because they’ve already decided that you’re within the acceptable margin of error.


















