Skip to main content

Exploring Multilingual Word Embedding Alignments in BERT Models: A Case Study of English and Norwegian

  • Conference paper
  • First Online:
Artificial Intelligence XL (SGAI 2023)

Abstract

Contextual language models, such as transformers, can solve a wide range of language tasks ranging from text classification to question answering and machine translation. Like many deep learning models, the performance heavily depends on the quality and amount of data available for training. This poses a problem for low-resource languages, such as Norwegian, that can not provide the necessary amount of training data. In this article, we investigate the use of multilingual models as a step toward overcoming the data sparsity problem for minority languages. In detail, we study how words are represented by multilingual BERT models across two languages of our interest: English and Norwegian. Our analysis shows that multilingual models similarly encode English-Norwegian word pairs. The multilingual model automatically aligns semantics across languages without supervision. Additionally, our analysis also shows that embedding a word encodes information about the language to which it belongs. We, therefore, believe that in pre-trained multilingual models’ knowledge from one language can be transferred to another without direct supervision and help solve the data sparsity problem for minor languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    !https://huggingface.co/datasets Visited: 19.01.2023.

  2. 2.

    !https://huggingface.co/bert-base-multilingual-cased.

  3. 3.

    !https://www.nb.no/sprakbanken/ressurskatalog/oai-nb-no-sbr-4/.

  4. 4.

    !https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-23/.

  5. 5.

    !https://github.com/facebookresearch/MUSE.

  6. 6.

    !https://www.nltk.org/nltk_data/.

  7. 7.

    !https://www.elrc-share.eu/repository/browse/bilingual-english-norwegian-parallel-corpus-from-the-office-of-the-auditor-general-riksrevisjonen-website/a5d2470201e311e9b7d400155d0267060fffdc9258a741659ce9e52ef15a7c26/.

References

  1. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches (2009)

    Google Scholar 

  2. Bommasani, R., Davis, K., Cardie, C.: Interpreting pretrained contextualized representations via reductions to static embeddings. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4758–4781 (2020)

    Google Scholar 

  3. Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)

  4. Cao, S., Kitaev, N., Klein, D.: Multilingual alignment of contextual word representations. arXiv preprint arXiv:2002.03518 (2020)

  5. Chronis, G., Erk, K.: When is a bishop not like a rook? When it’s like a rabbi! Multi-prototype BERT embeddings for estimating semantic relationships. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 227–244 (2020)

    Google Scholar 

  6. Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. arXiv preprint arXiv:1710.04087 (2017)

  7. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1(MLM), pp. 4171–4186 (2019)

    Google Scholar 

  8. Francis, W.N., Kucera, H.: Brown corpus manual. Lett. Editor 5(2), 7 (1979)

    Google Scholar 

  9. Gerz, D., Vulić, I., Hill, F., Reichart, R., Korhonen, A.: SimVerb-3500: a large-scale evaluation set of verb similarity. arXiv preprint arXiv:1608.00869 (2016)

  10. Hill, F., Reichart, R., Korhonen, A.: SimLex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)

    Article  MathSciNet  Google Scholar 

  11. Isbister, T., Carlsson, F., Sahlgren, M.: Should we stop training more monolingual models, and simply use machine translation instead? arXiv preprint arXiv:2104.10441 (2021)

  12. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference, vol. 2 (2017). https://doi.org/10.18653/v1/e17-2068

  13. Kummervold, P.E., la Rosa, J., Wetjen, F., Brygfjeld, S.A.: Operationalizing a national digital library: the case for a Norwegian transformer model. arXiv preprint arXiv:2104.09617 (2021)

  14. Kutuzov, A., Barnes, J., Velldal, E., Øvrelid, L., Oepen, S.: Large-scale contextualised language modelling for Norwegian. arXiv preprint arXiv:2104.06546 (2021)

  15. Liu, C.L., Hsu, T.Y., Chuang, Y.S., Lee, H.Y.: A study of cross-lingual ability and language-specific information in multilingual BERT. arXiv preprint arXiv:2004.09205 (2020)

  16. Liu, N.F., Gardner, M., Belinkov, Y., Peters, M.E., Smith, N.A.: Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855 (2019)

  17. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.1 (2019). https://arxiv.org/abs/1907.11692

  18. Loureiro, D., Rezaee, K., Pilehvar, M.T., Camacho-Collados, J.: Analysis and evaluation of language models for word sense disambiguation. Comput. Linguist. 47, 387–443 (2021)

    Google Scholar 

  19. Malmsten, M., Börjeson, L., Haffenden, C.: Playing with words at the national library of Sweden-making a Swedish BERT. arXiv preprint arXiv:2007.01658 (2020)

  20. Martin, L., et al.: CamemBERT: a tasty French language model. arXiv preprint arXiv:1911.03894 (2019)

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings ICLR 2013, pp. 1–12 (2013). https://arxiv.org/abs/1301.3781

  22. Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)

  23. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP 2014–2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2014). https://doi.org/10.3115/v1/d14-1162

  24. Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822 (2018)

  25. Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuad: 100,000+ questions for machine comprehension of text. In: EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (II), pp. 2383–2392 (2016). https://doi.org/10.18653/v1/d16-1264

  26. Riksrevisjonen: Bilingual English-Norwegian parallel corpus from the Office of the auditor general (Riksrevisjonen) website - ELRC-SHARE (2018). https://www.elrc-share.eu/repository/browse/bilingual-english-norwegian-parallel-corpus-from-the-office-of-the-auditor-general-riksrevisjonen-website/a5d2470201e311e9b7d400155d0267060fffdc9258a741659ce9e52ef15a7c26/

  27. Rogers, A., Kovaleva, O., Rumshisky, A.: A primer in BERTology: what we know about how BERT works. Trans. Assoc. Comput. Linguist. 8, 842–866 (2020)

    Article  Google Scholar 

  28. Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. arXiv preprint arXiv:1905.05950 (2019)

  29. Tenney, I., et al.: What do you learn from context? Probing for sentence structure in contextualized word representations. arXiv preprint arXiv:1905.06316 (2019)

  30. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 2017-(NIPS), pp. 5999–6009 (2017)

    Google Scholar 

  31. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)

  32. Wiedemann, G., Remus, S., Chawla, A., Biemann, C.: Does BERT make any sense? Interpretable word sense disambiguation with contextualized embeddings. arXiv preprint arXiv:1909.10430 (2019)

  33. Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabrizio Palumbo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aaby, P., Biermann, D., Yazidi, A., Mello, G.B.M.e., Palumbo, F. (2023). Exploring Multilingual Word Embedding Alignments in BERT Models: A Case Study of English and Norwegian. In: Bramer, M., Stahl, F. (eds) Artificial Intelligence XL. SGAI 2023. Lecture Notes in Computer Science(), vol 14381. Springer, Cham. https://doi.org/10.1007/978-3-031-47994-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47994-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47993-9

  • Online ISBN: 978-3-031-47994-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics