Skip to main content

Did You Just Assume My Vector? Detecting Gender Stereotypes in Word Embeddings

  • Conference paper
  • First Online:
Recent Trends in Analysis of Images, Social Networks and Texts (AIST 2020)

Abstract

Recent studies found out that supervised machine learning models can capture prejudices and stereotypes from training data. Our study focuses on the detection of gender stereotypes in relation to word embeddings. We review prior work on the topic and propose a comparative study of existing methods of gender stereotype detection. We evaluate various word embeddings models with these methods and conclude that the amount of bias does not depend on the corpora size and training algorithm, and does not correlate with embeddings performance on the standard evaluation benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Here and after we will use L2-normalized Euclidian distance in all cases where we measure the distance between the vectors: assuming that the vectors are normalized, the choice between Euclidian distance and cosine similarity does not affect their results [17].

References

  1. Rogers, A., Hosur Ananthakrishna, S., Rumshisky, A.: What’s in your embedding, and how it predicts task performance. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, Association for Computational Linguistics, pp. 2690–2703, August 2018

    Google Scholar 

  2. Senel, L.K., Utlu, I., Yucesoy, V., Koc, A., Cukur, T.: Semantic structure and interpretability of word embeddings. arXiv preprint arXiv:1711.00331 (2017)

  3. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    Google Scholar 

  4. Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1521–1528. IEEE (2011)

    Google Scholar 

  5. Hardt, M., Price, E., Srebro, N., et al.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems, pp. 3315–3323 (2016)

    Google Scholar 

  6. Gordon, J., Van Durme, B.: Reporting bias and knowledge extraction (2013)

    Google Scholar 

  7. Wagner, C., Garcia, D., Jadidi, M., Strohmaier, M.: It’s a man’s Wikipedia? Assessing gender inequality in an online encyclopedia. In: ICWSM, pp. 454–463 (2015)

    Google Scholar 

  8. Font, J.E., Costa-jussà, M.R.: Equalizing gender biases in neural machine translation with word embeddings techniques. arXiv preprint arXiv:1901.03116 (2019)

  9. Mishra, A., Mishra, H., Rathee, S.: Examining the presence of gender bias in customer reviews using word embedding. arXiv preprint arXiv:1902.00496 (2019)

  10. Blodgett, S.L., Barocas, S., Daumé III, H., Wallach, H.: Language (technology) is power: a critical survey of “bias” in NLP. arXiv preprint arXiv:2005.14050 (2020)

  11. Schmidt, B.: Rejecting the gender binary: a vector-space operation (2015)

    Google Scholar 

  12. Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In: Advances in Neural Information Processing Systems, pp. 4349–4357 (2016)

    Google Scholar 

  13. Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017)

    Article  Google Scholar 

  14. Swinger, N., De-Arteaga, M., NeilThomasHeffernan, I., Leiserson, M.D.M., Kalai, A.T.: What are the biases in my word embedding? CoRR abs/1812.08769 (2018)

    Google Scholar 

  15. Zhao, J., Zhou, Y., Li, Z., Wang, W., Chang, K.W.: Learning gender-neutral word embeddings. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4847–4853 (2018)

    Google Scholar 

  16. Kozlowski, A.C., Taddy, M., Evans, J.A.: The geometry of culture: analyzing meaning through word embeddings. arXiv preprint arXiv:1803.09288 (2018)

  17. Garg, N., Schiebinger, L., Jurafsky, D., Zou, J.: Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Nat. Acad. Sci. 115(16), E3635–E3644 (2018)

    Article  Google Scholar 

  18. Brunet, M.E., Alkalay-Houlihan, C., Anderson, A., Zemel, R.: Understanding the origins of bias in word embeddings. In: International Conference on Machine Learning, pp. 803–811 (2019)

    Google Scholar 

  19. Dev, S., Phillips, J.: Attenuating bias in word vectors. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 879–887 (2019)

    Google Scholar 

  20. Lauscher, A., Glavaš, G., Ponzetto, S.P., Vulić, I.: A general framework for implicit and explicit debiasing of distributional word vector spaces. arXiv preprint arXiv:1909.06092 (2019)

  21. Kaneko, M., Bollegala, D.: Gender-preserving debiasing for pre-trained word embeddings. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1641–1650 (2019)

    Google Scholar 

  22. Hoyle, A.M., Wolf-sonkin, L., Wallach, H., Augenstein, I., Cotterell, R.: Unsupervised discovery of gendered language through latent-variable modeling. In: 57th Annual Meeting of the Association for Computational Linguistics Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 1706–1716 (2019)

    Google Scholar 

  23. Basta, C., Costa-jussà, M.R., Casas, N.: Extensive study on the underlying gender bias in contextualized word embeddings. Neural Comput. Appl. 1–14 (2020). https://doi.org/10.1007/s00521-020-05211-z

  24. Pujari, A.K., Mittal, A., Padhi, A., Jain, A., Jadon, M., Kumar, V.: Debiasing gender biased Hindi words with word-embedding. In: Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence, pp. 450–456 (2019)

    Google Scholar 

  25. Papakyriakopoulos, O., Hegelich, S., Serrano, J.C.M., Marco, F.: Bias in word embeddings. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 446–457 (2020)

    Google Scholar 

  26. Gonen, H., Goldberg, Y.: Lipstick on a pig: debiasing methods cover up systematic gender biases in word embeddings but do not remove them. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 609–614 (2019)

    Google Scholar 

  27. Shin, S., Song, K., Jang, J., Kim, H., Joo, W., Moon, I.C.: Neutralizing gender bias in word embedding with latent disentanglement and counterfactual generation. arXiv preprint arXiv:2004.03133 (2020)

  28. Gyamfi, E.O., Rao, Y., Gou, M., Shao, Y.: deb2viz: debiasing gender in word embedding data using subspace visualization. In: Eleventh International Conference on Graphics and Image Processing (ICGIP 2019), vol. 11373, p. 113732F. International Society for Optics and Photonics (2020)

    Google Scholar 

  29. Wang, T., Lin, X.V., Rajani, N.F., McCann, B., Ordonez, V., Xiong, C.: Double-hard debias: tailoring word embeddings for gender bias mitigation. arXiv preprint arXiv:2005.00965 (2020)

  30. Kumar, V., Bhotia, T.S., Kumar, V., Chakraborty, T.: Nurse is closer to woman than surgeon? Mitigating gender-biased proximities in word embeddings. Trans. Assoc. Comput. Linguist. 8, 486–503 (2020)

    Article  Google Scholar 

  31. Rios, A., Joshi, R., Shin, H.: Quantifying 60 years of gender bias in biomedical research with word embeddings. In: Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, pp. 1–13 (2020)

    Google Scholar 

  32. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  33. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  34. Kutuzov, A., Fares, M., Oepen, S., Velldal, E.: Word vectors, reuse, and replicability: towards a community repository of large-text resources. In: Proceedings of the 58th Conference on Simulation and Modelling, pp. 271–276. Linköping University Electronic Press (2017)

    Google Scholar 

  35. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  36. Luong, M.T., Socher, R., Manning, C.D.: Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 104–113 (2013)

    Google Scholar 

  37. Agirre, E., Alfonseca, E., Hall, K., Kravalová, J., Pasca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 19–27 (2009)

    Google Scholar 

  38. Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)

    Google Scholar 

  39. Bakarov, A.: A survey of word embeddings evaluation methods. arXiv preprint arXiv:1801.09536 (2018)

Download references

Acknowledgments

The reported study was funded by the Russian Foundation for Basic Research project 20-37-90153 “Development of framework for distributional semantic models evaluation”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amir Bakarov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bakarov, A. (2021). Did You Just Assume My Vector? Detecting Gender Stereotypes in Word Embeddings. In: van der Aalst, W.M.P., et al. Recent Trends in Analysis of Images, Social Networks and Texts. AIST 2020. Communications in Computer and Information Science, vol 1357. Springer, Cham. https://doi.org/10.1007/978-3-030-71214-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-71214-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-71213-6

  • Online ISBN: 978-3-030-71214-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics