Targeting the Source: Selective Data Curation for Debiasing NLP Models

Gaci, Yacine; Benatallah, Boualem; Casati, Fabio; Benabdeslem, Khalid

doi:10.1007/978-3-031-43415-0_17

Yacine Gaci¹²,
Boualem Benatallah¹³,
Fabio Casati¹⁴ &
…
Khalid Benabdeslem¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14170))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

953 Accesses
1 Citations

Abstract

Unjustified social stereotypes have lately been found to taint the predictions of NLP models. Thus, an increasing amount of research focuses on developing methods to mitigate social bias. Most proposed approaches update the parameters of models post-hoc, running the risk of forgetting the predictive task of interest. In this work, we propose a novel way of debiasing NLP models by debiasing and curating their training data. To do so, we propose an unsupervised pipeline to identify which instances in the training data mention stereotypes that tally with the stereotypes encoded in NLP models. Then we either remove or augment these problematic instances, and train NLP models on less biased data. In this pipeline, we propose three methods to excavate stereotypes encoded in models using likelihoods, attention weights and vector representations. Experiments on the tasks of natural language inference, sentiment analysis and question answering suggest that our methods are better at debiasing downstream models than existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
e.g., [GRP] are good at math.
2.
https://github.com/YacineGACI/Model-Aware-Data-Debiasing.
3.
38 = 69 - Mean({31}).
4.
-13.33 = 15 - Mean({62, 13, 10}).
5.
https://github.com/pytorch/pytorch.
6.
https://github.com/huggingface/transformers.

References

Attanasio, G., Nozza, D., Hovy, D., Baralis, E.: Entropy-based attention regularization frees unintended bias mitigation from lists. In: Findings of the Association for Computational Linguistics: ACL 2022, pp. 1105–1119 (2022)
Google Scholar
Blodgett, S.L., Barocas, S., Daumé III, H., Wallach, H.: Language (technology) is power: A critical survey of “bias” in nlp. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5454–5476 (2020)
Google Scholar
Blodgett, S.L., Lopez, G., Olteanu, A., Sim, R., Wallach, H.: Stereotyping norwegian salmon: an inventory of pitfalls in fairness benchmark datasets. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1004–1015 (2021)
Google Scholar
Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Adv. Neural. Inf. Process. Syst. 29, 4349–4357 (2016)
Google Scholar
Bordia, S., Bowman, S.R.: Identifying and reducing gender bias in word-level language models. NAACL HLT 2019, 7 (2019)
Google Scholar
Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017)
Article Google Scholar
Cheng, P., Hao, W., Yuan, S., Si, S., Carin, L.: Fairfil: Contrastive neural debiasing method for pretrained text encoders. In: International Conference on Learning Representations (2020)
Google Scholar
Dev, S., Li, T., Phillips, J.M., Srikumar, V.: On measuring and mitigating biased inferences of word embeddings. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 7659–7666 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Díaz, M., Johnson, I., Lazar, A., Piper, A.M., Gergle, D.: Addressing age-related bias in sentiment analysis. In: Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1–14 (2018)
Google Scholar
Dinan, E., Fan, A., Williams, A., Urbanek, J., Kiela, D., Weston, J.: Queens are powerful too: Mitigating gender bias in dialogue generation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8173–8188 (2020)
Google Scholar
Elazar, Y., Goldberg, Y.: Adversarial removal of demographic attributes from text data. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 11–21 (2018)
Google Scholar
Gaci, Y., Benatallah, B., Casati, F., Benabdeslem, K.: Iterative adversarial removal of gender bias in pretrained word embeddings. In: Proceedings of the 37th ACM/SIGAPP Symposium On Applied Computing, pp. 829–836 (2022)
Google Scholar
Gaci, Y., Benattallah, B., Casati, F., Benabdeslem, K.: Debiasing pretrained text encoders by paying attention to paying attention. In: 2022 Conference on Empirical Methods in Natural Language Processing, pp. 9582–9602. Association for Computational Linguistics (2022)
Google Scholar
Goldfarb-Tarrant, S., Marchant, R., Sanchez, R.M., Pandya, M., Lopez, A.: Intrinsic bias metrics do not correlate with application bias. arXiv preprint arXiv:2012.15859 (2020)
Houlsby, N., et al.: Parameter-efficient transfer learning for nlp. In: International Conference on Machine Learning, pp. 2790–2799. PMLR (2019)
Google Scholar
Hussein, D.M.E.D.M.: A survey on sentiment analysis challenges. J. King Saud Univ.-Eng. Sci. 30(4), 330–338 (2018)
Google Scholar
Kaneko, M., Bollegala, D.: Gender-preserving debiasing for pre-trained word embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1641–1650 (2019)
Google Scholar
Kaneko, M., Bollegala, D.: Debiasing pre-trained contextualised embeddings. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1256–1266 (2021)
Google Scholar
Kaneko, M., Bollegala, D.: Unmasking the mask-evaluating social biases in masked language models. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 11954–11962 (2022)
Google Scholar
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Article MathSciNet MATH Google Scholar
Kurita, K., Vyas, N., Pareek, A., Black, A.W., Tsvetkov, Y.: Measuring bias in contextualized word representations. In: Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp. 166–172 (2019)
Google Scholar
Lauscher, A., Lueken, T., Glavaš, G.: Sustainable modular debiasing of language models. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 4782–4797 (2021)
Google Scholar
Leavy, S., Meaney, G., Wade, K., Greene, D.: Mitigating gender bias in machine learning data sets. In: Boratto, L., Faralli, S., Marras, M., Stilo, G. (eds.) BIAS 2020. CCIS, vol. 1245, pp. 12–26. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52485-2_2
Chapter Google Scholar
Li, T., Khashabi, D., Khot, T., Sabharwal, A., Srikumar, V.: Unqovering stereotypical biases via underspecified questions. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 3475–3489 (2020)
Google Scholar
Li, Y., Baldwin, T., Cohn, T.: Towards robust and privacy-preserving text representations. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 25–30 (2018)
Google Scholar
Liang, P.P., Li, I.M., Zheng, E., Lim, Y.C., Salakhutdinov, R., Morency, L.P.: Towards debiasing sentence representations. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5502–5515 (2020)
Google Scholar
Liang, S., Dufter, P., Schütze, H.: Monolingual and multilingual reduction of gender bias in contextualized representations. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5082–5093 (2020)
Google Scholar
Madaan, N., et al.: Analyze, detect and remove gender stereotyping from bollywood movies. In: Conference on fairness, accountability and transparency, pp. 92–105. PMLR (2018)
Google Scholar
Manzini, T., Lim, Y.C., Tsvetkov, Y., Black, A.W.: Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. In: 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (2019)
Google Scholar
May, C., Wang, A., Bordia, S., Bowman, S., Rudinger, R.: On measuring social biases in sentence encoders. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 622–628 (2019)
Google Scholar
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: The sequential learning problem. In: Psychology of learning and motivation, vol. 24, pp. 109–165. Elsevier (1989)
Google Scholar
Nadeem, M., Bethke, A., Reddy, S.: Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456 (2020)
Nangia, N., Vania, C., Bhalerao, R., Bowman, S.: Crows-pairs: A challenge dataset for measuring social biases in masked language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1953–1967 (2020)
Google Scholar
Park, J.H., Shin, J., Fung, P.: Reducing gender bias in abusive language detection. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2799–2804 (2018)
Google Scholar
Pfeiffer, J., et al.: Adapterhub: A framework for adapting transformers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 46–54 (2020)
Google Scholar
Prabhakaran, V., Hutchinson, B., Mitchell, M.: Perturbation sensitivity analysis to detect unintended model biases. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5740–5745 (2019)
Google Scholar
Qian, R., Ross, C., Fernandes, J., Smith, E., Kiela, D., Williams, A.: Perturbation augmentation for fairer nlp. arXiv preprint arXiv:2205.12586 (2022)
Qian, Y., Muaz, U., Zhang, B., Hyun, J.W.: Reducing gender bias in word-level language models with a gender-equalizing loss function. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 223–228 (2019)
Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)
Google Scholar
Ravfogel, S., Elazar, Y., Gonen, H., Twiton, M., Goldberg, Y.: Null it out: Guarding protected attributes by iterative nullspace projection. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7237–7256 (2020)
Google Scholar
Sap, M., Gabriel, S., Qin, L., Jurafsky, D., Smith, N.A., Choi, Y.: Social bias frames: Reasoning about social and power implications of language. In: Association for Computational Linguistics (2020)
Google Scholar
Schick, T., Udupa, S., Schütze, H.: Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp. arXiv preprint arXiv:2103.00453 (2021)
Sheng, E., Chang, K.W., Natarajan, P., Peng, N.: Towards controllable biases in language generation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 3239–3254 (2020)
Google Scholar
Shin, S., Song, K., Jang, J., Kim, H., Joo, W., Moon, I.C.: Neutralizing gender bias in word embeddings with latent disentanglement and counterfactual generation. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 3126–3140 (2020)
Google Scholar
Sun, T., et al.: Mitigating gender bias in natural language processing: Literature review. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1630–1640 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)
Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: Glue: a multi-task benchmark and analysis platform for natural language understanding. EMNLP 2018, 353 (2018)
Google Scholar
Wang, L., Yan, Y., He, K., Wu, Y., Xu, W.: Dynamically disentangling social bias from task-oriented representations with adversarial attack. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3740–3750 (2021)
Google Scholar
Webster, K., Wang, X., Tenney, I., Beutel, A., Pitler, E., Pavlick, E., Chen, J., Chi, E., Petrov, S.: Measuring and reducing gendered correlations in pre-trained models. arXiv preprint arXiv:2010.06032 (2020)
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Gender bias in coreference resolution: Evaluation and debiasing methods. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 15–20 (2018)
Google Scholar
Zmigrod, R., Mielke, S.J., Wallach, H., Cotterell, R.: Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1651–1661 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Lyon, Lyon, France
Yacine Gaci & Khalid Benabdeslem
Dublin City University, Dublin, Ireland
Boualem Benatallah
ServiceNow, Palo Alto, USA
Fabio Casati

Authors

Yacine Gaci
View author publications
You can also search for this author in PubMed Google Scholar
Boualem Benatallah
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Casati
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Benabdeslem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yacine Gaci .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gaci, Y., Benatallah, B., Casati, F., Benabdeslem, K. (2023). Targeting the Source: Selective Data Curation for Debiasing NLP Models. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14170. Springer, Cham. https://doi.org/10.1007/978-3-031-43415-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-43415-0_17
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43414-3
Online ISBN: 978-3-031-43415-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Targeting the Source: Selective Data Curation for Debiasing NLP Models