Data and Racism in Machine Learning?
We often hear stories these days about racism in machine learning algorithms. The subtlety in these stories is often missing. I've been reading about this recently and found this quote very telling:
A wave of scholarship, triggered by the ProPublica report, illuminated the statistical challenge at the heart of the argument: Given that the underlying “base rate” of rearrest is higher for blacks than for whites, it is mathematically inevitable that the burden of false positives will fall more heavily on black defendants than on white ones. In other words, given that more black defendants than white defendants actually do have a high risk of reoffending, a “high risk” label that is correct 70% of the time for both white and black defendants will still mis-label more black than white defendants as high risk. A study titled “Inherent Tradeoffs in the Fair Determination of Risk Scores” proved mathematically that when rearrest rates are not equal between races, a well-calibrated tool like Northpointe’s – that is, a tool that is mistaken equally often about whites and about blacks – will inevitably have more false positives for blacks. As the authors of a second study explained, to equalize the error rates, one would have to make the tool itself race conscious, and “set multiple, race-specific [risk] thresholds.”
In short, an intuitive understanding of equal protection cannot square with the mathematics of predictive risk scoring. Under real conditions, a tool that is equally often mistaken about white and black defendants will more often send blacks to jail by mistake than send whites to jail by mistake. But an explicitly race-conscious risk assessment tool, that predicted scores differently for whites than for blacks, would itself face serious constitutional challenges. An understanding of “equal protection” that would require race-blindness, and simultaneously require that races are burdened equally by prediction errors, simply does not leave room for risk assessment tools to operate.The quote is from "The Challenges of Prediction: Lessons from Criminal Justice."