Measuring Fairness of Text Classifiers via Prediction Sensitivity. Krishna, S., Gupta, R., Verma, A., Dhamala, J., Pruksachatkun, Y., & Chang, K. In Muresan, S., Nakov, P., & Villavicencio, A., editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5830–5842, Dublin, Ireland, May, 2022. Association for Computational Linguistics.
Measuring Fairness of Text Classifiers via Prediction Sensitivity [link]Paper  doi  abstract   bibtex   
With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is lack of consensus on which metrics most accurately reflect the fairness of a system. In this work, we propose a new formulation – accumulated prediction sensitivity, which measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness. It also correlates well with humans' perception of fairness. We conduct experiments on two text classification datasets – Jigsaw Toxicity, and Bias in Bios, and evaluate the correlations between metrics and manual annotations on whether the model produced a fair outcome. We observe that the proposed fairness metric based on prediction sensitivity is statistically significantly more correlated with human annotation than the existing counterfactual fairness metric.
@inproceedings{krishna-etal-2022-measuring,
    title = "Measuring Fairness of Text Classifiers via Prediction Sensitivity",
    author = "Krishna, Satyapriya  and
      Gupta, Rahul  and
      Verma, Apurv  and
      Dhamala, Jwala  and
      Pruksachatkun, Yada  and
      Chang, Kai-Wei",
    editor = "Muresan, Smaranda  and
      Nakov, Preslav  and
      Villavicencio, Aline",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.401",
    doi = "10.18653/v1/2022.acl-long.401",
    pages = "5830--5842",
    abstract = "With the rapid growth in language processing applications, fairness has emerged as an important consideration in data-driven solutions. Although various fairness definitions have been explored in the recent literature, there is lack of consensus on which metrics most accurately reflect the fairness of a system. In this work, we propose a new formulation {--} accumulated prediction sensitivity, which measures fairness in machine learning models based on the model{'}s prediction sensitivity to perturbations in input features. The metric attempts to quantify the extent to which a single prediction depends on a protected attribute, where the protected attribute encodes the membership status of an individual in a protected group. We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness. It also correlates well with humans{'} perception of fairness. We conduct experiments on two text classification datasets {--} Jigsaw Toxicity, and Bias in Bios, and evaluate the correlations between metrics and manual annotations on whether the model produced a fair outcome. We observe that the proposed fairness metric based on prediction sensitivity is statistically significantly more correlated with human annotation than the existing counterfactual fairness metric.",
}

Downloads: 0