Skip to main content

Targeting the Source: Selective Data Curation for Debiasing NLP Models

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14170))

Abstract

Unjustified social stereotypes have lately been found to taint the predictions of NLP models. Thus, an increasing amount of research focuses on developing methods to mitigate social bias. Most proposed approaches update the parameters of models post-hoc, running the risk of forgetting the predictive task of interest. In this work, we propose a novel way of debiasing NLP models by debiasing and curating their training data. To do so, we propose an unsupervised pipeline to identify which instances in the training data mention stereotypes that tally with the stereotypes encoded in NLP models. Then we either remove or augment these problematic instances, and train NLP models on less biased data. In this pipeline, we propose three methods to excavate stereotypes encoded in models using likelihoods, attention weights and vector representations. Experiments on the tasks of natural language inference, sentiment analysis and question answering suggest that our methods are better at debiasing downstream models than existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    e.g., [GRP] are good at math.

  2. 2.

    https://github.com/YacineGACI/Model-Aware-Data-Debiasing.

  3. 3.

    38 = 69 - Mean({31}).

  4. 4.

    -13.33 = 15 - Mean({62, 13, 10}).

  5. 5.

    https://github.com/pytorch/pytorch.

  6. 6.

    https://github.com/huggingface/transformers.

References

  1. Attanasio, G., Nozza, D., Hovy, D., Baralis, E.: Entropy-based attention regularization frees unintended bias mitigation from lists. In: Findings of the Association for Computational Linguistics: ACL 2022, pp. 1105–1119 (2022)

    Google Scholar 

  2. Blodgett, S.L., Barocas, S., Daumé III, H., Wallach, H.: Language (technology) is power: A critical survey of “bias” in nlp. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5454–5476 (2020)

    Google Scholar 

  3. Blodgett, S.L., Lopez, G., Olteanu, A., Sim, R., Wallach, H.: Stereotyping norwegian salmon: an inventory of pitfalls in fairness benchmark datasets. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1004–1015 (2021)

    Google Scholar 

  4. Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Adv. Neural. Inf. Process. Syst. 29, 4349–4357 (2016)

    Google Scholar 

  5. Bordia, S., Bowman, S.R.: Identifying and reducing gender bias in word-level language models. NAACL HLT 2019, 7 (2019)

    Google Scholar 

  6. Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017)

    Article  Google Scholar 

  7. Cheng, P., Hao, W., Yuan, S., Si, S., Carin, L.: Fairfil: Contrastive neural debiasing method for pretrained text encoders. In: International Conference on Learning Representations (2020)

    Google Scholar 

  8. Dev, S., Li, T., Phillips, J.M., Srikumar, V.: On measuring and mitigating biased inferences of word embeddings. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 7659–7666 (2020)

    Google Scholar 

  9. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)

    Google Scholar 

  10. Díaz, M., Johnson, I., Lazar, A., Piper, A.M., Gergle, D.: Addressing age-related bias in sentiment analysis. In: Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1–14 (2018)

    Google Scholar 

  11. Dinan, E., Fan, A., Williams, A., Urbanek, J., Kiela, D., Weston, J.: Queens are powerful too: Mitigating gender bias in dialogue generation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8173–8188 (2020)

    Google Scholar 

  12. Elazar, Y., Goldberg, Y.: Adversarial removal of demographic attributes from text data. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 11–21 (2018)

    Google Scholar 

  13. Gaci, Y., Benatallah, B., Casati, F., Benabdeslem, K.: Iterative adversarial removal of gender bias in pretrained word embeddings. In: Proceedings of the 37th ACM/SIGAPP Symposium On Applied Computing, pp. 829–836 (2022)

    Google Scholar 

  14. Gaci, Y., Benattallah, B., Casati, F., Benabdeslem, K.: Debiasing pretrained text encoders by paying attention to paying attention. In: 2022 Conference on Empirical Methods in Natural Language Processing, pp. 9582–9602. Association for Computational Linguistics (2022)

    Google Scholar 

  15. Goldfarb-Tarrant, S., Marchant, R., Sanchez, R.M., Pandya, M., Lopez, A.: Intrinsic bias metrics do not correlate with application bias. arXiv preprint arXiv:2012.15859 (2020)

  16. Houlsby, N., et al.: Parameter-efficient transfer learning for nlp. In: International Conference on Machine Learning, pp. 2790–2799. PMLR (2019)

    Google Scholar 

  17. Hussein, D.M.E.D.M.: A survey on sentiment analysis challenges. J. King Saud Univ.-Eng. Sci. 30(4), 330–338 (2018)

    Google Scholar 

  18. Kaneko, M., Bollegala, D.: Gender-preserving debiasing for pre-trained word embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1641–1650 (2019)

    Google Scholar 

  19. Kaneko, M., Bollegala, D.: Debiasing pre-trained contextualised embeddings. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1256–1266 (2021)

    Google Scholar 

  20. Kaneko, M., Bollegala, D.: Unmasking the mask-evaluating social biases in masked language models. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 11954–11962 (2022)

    Google Scholar 

  21. Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  22. Kurita, K., Vyas, N., Pareek, A., Black, A.W., Tsvetkov, Y.: Measuring bias in contextualized word representations. In: Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pp. 166–172 (2019)

    Google Scholar 

  23. Lauscher, A., Lueken, T., Glavaš, G.: Sustainable modular debiasing of language models. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 4782–4797 (2021)

    Google Scholar 

  24. Leavy, S., Meaney, G., Wade, K., Greene, D.: Mitigating gender bias in machine learning data sets. In: Boratto, L., Faralli, S., Marras, M., Stilo, G. (eds.) BIAS 2020. CCIS, vol. 1245, pp. 12–26. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52485-2_2

    Chapter  Google Scholar 

  25. Li, T., Khashabi, D., Khot, T., Sabharwal, A., Srikumar, V.: Unqovering stereotypical biases via underspecified questions. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 3475–3489 (2020)

    Google Scholar 

  26. Li, Y., Baldwin, T., Cohn, T.: Towards robust and privacy-preserving text representations. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 25–30 (2018)

    Google Scholar 

  27. Liang, P.P., Li, I.M., Zheng, E., Lim, Y.C., Salakhutdinov, R., Morency, L.P.: Towards debiasing sentence representations. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5502–5515 (2020)

    Google Scholar 

  28. Liang, S., Dufter, P., Schütze, H.: Monolingual and multilingual reduction of gender bias in contextualized representations. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 5082–5093 (2020)

    Google Scholar 

  29. Madaan, N., et al.: Analyze, detect and remove gender stereotyping from bollywood movies. In: Conference on fairness, accountability and transparency, pp. 92–105. PMLR (2018)

    Google Scholar 

  30. Manzini, T., Lim, Y.C., Tsvetkov, Y., Black, A.W.: Black is to criminal as caucasian is to police: Detecting and removing multiclass bias in word embeddings. In: 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) (2019)

    Google Scholar 

  31. May, C., Wang, A., Bordia, S., Bowman, S., Rudinger, R.: On measuring social biases in sentence encoders. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 622–628 (2019)

    Google Scholar 

  32. McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: The sequential learning problem. In: Psychology of learning and motivation, vol. 24, pp. 109–165. Elsevier (1989)

    Google Scholar 

  33. Nadeem, M., Bethke, A., Reddy, S.: Stereoset: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456 (2020)

  34. Nangia, N., Vania, C., Bhalerao, R., Bowman, S.: Crows-pairs: A challenge dataset for measuring social biases in masked language models. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 1953–1967 (2020)

    Google Scholar 

  35. Park, J.H., Shin, J., Fung, P.: Reducing gender bias in abusive language detection. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2799–2804 (2018)

    Google Scholar 

  36. Pfeiffer, J., et al.: Adapterhub: A framework for adapting transformers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 46–54 (2020)

    Google Scholar 

  37. Prabhakaran, V., Hutchinson, B., Mitchell, M.: Perturbation sensitivity analysis to detect unintended model biases. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5740–5745 (2019)

    Google Scholar 

  38. Qian, R., Ross, C., Fernandes, J., Smith, E., Kiela, D., Williams, A.: Perturbation augmentation for fairer nlp. arXiv preprint arXiv:2205.12586 (2022)

  39. Qian, Y., Muaz, U., Zhang, B., Hyun, J.W.: Reducing gender bias in word-level language models with a gender-equalizing loss function. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 223–228 (2019)

    Google Scholar 

  40. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383–2392 (2016)

    Google Scholar 

  41. Ravfogel, S., Elazar, Y., Gonen, H., Twiton, M., Goldberg, Y.: Null it out: Guarding protected attributes by iterative nullspace projection. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7237–7256 (2020)

    Google Scholar 

  42. Sap, M., Gabriel, S., Qin, L., Jurafsky, D., Smith, N.A., Choi, Y.: Social bias frames: Reasoning about social and power implications of language. In: Association for Computational Linguistics (2020)

    Google Scholar 

  43. Schick, T., Udupa, S., Schütze, H.: Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp. arXiv preprint arXiv:2103.00453 (2021)

  44. Sheng, E., Chang, K.W., Natarajan, P., Peng, N.: Towards controllable biases in language generation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp. 3239–3254 (2020)

    Google Scholar 

  45. Shin, S., Song, K., Jang, J., Kim, H., Joo, W., Moon, I.C.: Neutralizing gender bias in word embeddings with latent disentanglement and counterfactual generation. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 3126–3140 (2020)

    Google Scholar 

  46. Sun, T., et al.: Mitigating gender bias in natural language processing: Literature review. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1630–1640 (2019)

    Google Scholar 

  47. Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)

    Google Scholar 

  48. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: Glue: a multi-task benchmark and analysis platform for natural language understanding. EMNLP 2018, 353 (2018)

    Google Scholar 

  49. Wang, L., Yan, Y., He, K., Wu, Y., Xu, W.: Dynamically disentangling social bias from task-oriented representations with adversarial attack. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3740–3750 (2021)

    Google Scholar 

  50. Webster, K., Wang, X., Tenney, I., Beutel, A., Pitler, E., Pavlick, E., Chen, J., Chi, E., Petrov, S.: Measuring and reducing gendered correlations in pre-trained models. arXiv preprint arXiv:2010.06032 (2020)

  51. Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Gender bias in coreference resolution: Evaluation and debiasing methods. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 15–20 (2018)

    Google Scholar 

  52. Zmigrod, R., Mielke, S.J., Wallach, H., Cotterell, R.: Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1651–1661 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yacine Gaci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gaci, Y., Benatallah, B., Casati, F., Benabdeslem, K. (2023). Targeting the Source: Selective Data Curation for Debiasing NLP Models. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14170. Springer, Cham. https://doi.org/10.1007/978-3-031-43415-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43415-0_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43414-3

  • Online ISBN: 978-3-031-43415-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics