Abstract
Fairness-aware learning studies the problem of
building machine learning models that are subject
to fairness requirements. Counterfactual fairness
is a notion of fairness derived from Pearl’s causal
model, which considers a model is fair if for a particular individual or group its prediction in the real
world is the same as that in the counterfactual world
where the individual(s) had belonged to a different demographic group. However, an inherent limitation of counterfactual fairness is that it cannot
be uniquely quantified from the observational data
in certain situations, due to the unidentifiability of
the counterfactual quantity. In this paper, we address this limitation by mathematically bounding
the unidentifiable counterfactual quantity, and develop a theoretically sound algorithm for constructing counterfactually fair classifiers. We evaluate
our method in the experiments using both synthetic
and real-world datasets, as well as compare with
existing methods. The results validate our theory
and show the effectiveness of our method