Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions. Thakur, H., Jain, A., Vaddamanu, P., Liang, P. P., & Morency, L. In Rogers, A., Boyd-Graber, J., & Okazaki, N., editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 340–351, Toronto, Canada, July, 2023. Association for Computational Linguistics.
Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions [link]Paper  doi  abstract   bibtex   
Societal biases present in pre-trained large language models are a critical issue as these models have been shown to propagate biases in countless downstream applications, rendering them unfair towards specific groups of people. Since large-scale retraining of these models from scratch is both time and compute-expensive, a variety of approaches have been previously proposed that de-bias a pre-trained model. While the majority of current state-of-the-art debiasing methods focus on changes to the training regime, in this paper, we propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models. Specifically, we empirically show that by fine-tuning a pre-trained model on only 10 debiased (intervened) training examples, the tendency to favor any gender is significantly reduced. Since our proposed method only needs a few training examples, we argue that our few-shot de-biasing approach is highly feasible and practical. Through extensive experimentation, we show that our de-biasing technique performs better than competitive state-of-the-art baselines with minimal loss in language modeling ability.
@inproceedings{thakurLanguageModelsGet2023,
	address = {Toronto, Canada},
	title = {Language {Models} {Get} a {Gender} {Makeover}: {Mitigating} {Gender} {Bias} with {Few}-{Shot} {Data} {Interventions}},
	shorttitle = {Language {Models} {Get} a {Gender} {Makeover}},
	url = {https://aclanthology.org/2023.acl-short.30},
	doi = {10.18653/v1/2023.acl-short.30},
	abstract = {Societal biases present in pre-trained large language models are a critical issue as these models have been shown to propagate biases in countless downstream applications, rendering them unfair towards specific groups of people. Since large-scale retraining of these models from scratch is both time and compute-expensive, a variety of approaches have been previously proposed that de-bias a pre-trained model. While the majority of current state-of-the-art debiasing methods focus on changes to the training regime, in this paper, we propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models. Specifically, we empirically show that by fine-tuning a pre-trained model on only 10 debiased (intervened) training examples, the tendency to favor any gender is significantly reduced. Since our proposed method only needs a few training examples, we argue that our few-shot de-biasing approach is highly feasible and practical. Through extensive experimentation, we show that our de-biasing technique performs better than competitive state-of-the-art baselines with minimal loss in language modeling ability.},
	urldate = {2024-07-29},
	booktitle = {Proceedings of the 61st {Annual} {Meeting} of the {Association} for {Computational} {Linguistics} ({Volume} 2: {Short} {Papers})},
	publisher = {Association for Computational Linguistics},
	author = {Thakur, Himanshu and Jain, Atishay and Vaddamanu, Praneetha and Liang, Paul Pu and Morency, Louis-Philippe},
	editor = {Rogers, Anna and Boyd-Graber, Jordan and Okazaki, Naoaki},
	month = jul,
	year = {2023},
	pages = {340--351},
}

Downloads: 0