Abstract
We propose an approach to training machine learning models that are fair in the sense that their performance is invariant under certain perturbations to the features. For example, the performance of a resume screening system should be invariant under changes to the name of the applicant. We formalize this intuitive notion of fairness by connecting it to the original notion of individual fairness put forth by Dwork et al and show that the proposed approach achieves this notion of fairness. We also demonstrate the effectiveness of the approach on two machine learning tasks that are susceptible to gender and racial biases.