Abstract
Machine learning algorithms are increasingly involved in sensitive decision-making processes with
adverse implications on individuals. This paper
presents mdfa, an approach that identifies the characteristics of the victims of a classifier’s discrimination. We measure discrimination as a violation of
multi-differential fairness. Multi-differential fairness is a guarantee that a black box classifier’s
outcomes do not leak information on the sensitive attributes of a small group of individuals.
We reduce the problem of identifying worst-case
violations to matching distributions and predicting where sensitive attributes and classifier’s outcomes coincide. We apply mdfa to a recidivism
risk assessment classifier and demonstrate that for
individuals with little criminal history, identified
African-Americans are three-times more likely to
be considered at high risk of violent recidivism than
similar non-African-Americans