Optimising Fairness Through Parametrised Data Sampling. González-Zelaya, V., Prangle, D., Salas, J., & Missier, P.
abstract   bibtex   
Improving machine learning models’ fairness is an active research topic, with most approaches focusing on specific definitions of fairness. In contrast, we propose ParDS, a parametrised data sampling method by which we can optimise the fairness ratios observed on a test set, in a way that is agnostic to both the specific fairness definitions, and the chosen classification model. Given a training set with one binary protected attribute and a binary label, our approach involves correcting the positive rate for both the favoured and unfavoured groups through resampling of the training set. We present experimental evidence showing that the amount of resampling can be optimised to achieve target fairness ratios for a specific training set and fairness definition, while preserving most of the model’s accuracy. We discuss conditions for the method to be viable, and then extend the method to include multiple protected attributes. In our experiments we use three different sampling strategies, and we report results for three commonly used definitions of fairness, and three public benchmark datasets: Adult Income, COMPAS and German Credit.
@article{gonzalez-zelaya_optimising_nodate,
	title = {Optimising {Fairness} {Through} {Parametrised} {Data} {Sampling}},
	abstract = {Improving machine learning models’ fairness is an active research topic, with most approaches focusing on specific definitions of fairness. In contrast, we propose ParDS, a parametrised data sampling method by which we can optimise the fairness ratios observed on a test set, in a way that is agnostic to both the specific fairness definitions, and the chosen classification model. Given a training set with one binary protected attribute and a binary label, our approach involves correcting the positive rate for both the favoured and unfavoured groups through resampling of the training set. We present experimental evidence showing that the amount of resampling can be optimised to achieve target fairness ratios for a specific training set and fairness definition, while preserving most of the model’s accuracy. We discuss conditions for the method to be viable, and then extend the method to include multiple protected attributes. In our experiments we use three different sampling strategies, and we report results for three commonly used definitions of fairness, and three public benchmark datasets: Adult Income, COMPAS and German Credit.},
	language = {en},
	author = {González-Zelaya, Vladimiro and Prangle, Dennis and Salas, Julián and Missier, Paolo},
	pages = {6},
	file = {González-Zelaya et al. - Optimising Fairness Through Parametrised Data Samp.pdf:/Users/npm65/Zotero/storage/G2VNJ4CU/González-Zelaya et al. - Optimising Fairness Through Parametrised Data Samp.pdf:application/pdf},
}

Downloads: 0