Exploring Multilingual Word Embedding Alignments in BERT Models: A Case Study of English and Norwegian

Aaby, Pernille; Biermann, Daniel; Yazidi, Anis; Mello, Gustavo Borges Moreno e; Palumbo, Fabrizio

doi:10.1007/978-3-031-47994-6_4

Pernille Aaby⁹,
Daniel Biermann¹⁰,
Anis Yazidi⁹,
Gustavo Borges Moreno e Mello⁹ &
…
Fabrizio Palumbo⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14381))

Included in the following conference series:

International Conference on Innovative Techniques and Applications of Artificial Intelligence

523 Accesses
3 Altmetric

Abstract

Contextual language models, such as transformers, can solve a wide range of language tasks ranging from text classification to question answering and machine translation. Like many deep learning models, the performance heavily depends on the quality and amount of data available for training. This poses a problem for low-resource languages, such as Norwegian, that can not provide the necessary amount of training data. In this article, we investigate the use of multilingual models as a step toward overcoming the data sparsity problem for minority languages. In detail, we study how words are represented by multilingual BERT models across two languages of our interest: English and Norwegian. Our analysis shows that multilingual models similarly encode English-Norwegian word pairs. The multilingual model automatically aligns semantics across languages without supervision. Additionally, our analysis also shows that embedding a word encodes information about the language to which it belongs. We, therefore, believe that in pre-trained multilingual models’ knowledge from one language can be transferred to another without direct supervision and help solve the data sparsity problem for minor languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches (2009)
Google Scholar
Bommasani, R., Davis, K., Cardie, C.: Interpreting pretrained contextualized representations via reductions to static embeddings. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4758–4781 (2020)
Google Scholar
Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Cao, S., Kitaev, N., Klein, D.: Multilingual alignment of contextual word representations. arXiv preprint arXiv:2002.03518 (2020)
Chronis, G., Erk, K.: When is a bishop not like a rook? When it’s like a rabbi! Multi-prototype BERT embeddings for estimating semantic relationships. In: Proceedings of the 24th Conference on Computational Natural Language Learning, pp. 227–244 (2020)
Google Scholar
Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. arXiv preprint arXiv:1710.04087 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL HLT 2019–2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference 1(MLM), pp. 4171–4186 (2019)
Google Scholar
Francis, W.N., Kucera, H.: Brown corpus manual. Lett. Editor 5(2), 7 (1979)
Google Scholar
Gerz, D., Vulić, I., Hill, F., Reichart, R., Korhonen, A.: SimVerb-3500: a large-scale evaluation set of verb similarity. arXiv preprint arXiv:1608.00869 (2016)
Hill, F., Reichart, R., Korhonen, A.: SimLex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)
Article MathSciNet Google Scholar
Isbister, T., Carlsson, F., Sahlgren, M.: Should we stop training more monolingual models, and simply use machine translation instead? arXiv preprint arXiv:2104.10441 (2021)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017 - Proceedings of Conference, vol. 2 (2017). https://doi.org/10.18653/v1/e17-2068
Kummervold, P.E., la Rosa, J., Wetjen, F., Brygfjeld, S.A.: Operationalizing a national digital library: the case for a Norwegian transformer model. arXiv preprint arXiv:2104.09617 (2021)
Kutuzov, A., Barnes, J., Velldal, E., Øvrelid, L., Oepen, S.: Large-scale contextualised language modelling for Norwegian. arXiv preprint arXiv:2104.06546 (2021)
Liu, C.L., Hsu, T.Y., Chuang, Y.S., Lee, H.Y.: A study of cross-lingual ability and language-specific information in multilingual BERT. arXiv preprint arXiv:2004.09205 (2020)
Liu, N.F., Gardner, M., Belinkov, Y., Peters, M.E., Smith, N.A.: Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855 (2019)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. CoRR abs/1907.1 (2019). https://arxiv.org/abs/1907.11692
Loureiro, D., Rezaee, K., Pilehvar, M.T., Camacho-Collados, J.: Analysis and evaluation of language models for word sense disambiguation. Comput. Linguist. 47, 387–443 (2021)
Google Scholar
Malmsten, M., Börjeson, L., Haffenden, C.: Playing with words at the national library of Sweden-making a Swedish BERT. arXiv preprint arXiv:2007.01658 (2020)
Martin, L., et al.: CamemBERT: a tasty French language model. arXiv preprint arXiv:1911.03894 (2019)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings ICLR 2013, pp. 1–12 (2013). https://arxiv.org/abs/1301.3781
Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP 2014–2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2014). https://doi.org/10.3115/v1/d14-1162
Rajpurkar, P., Jia, R., Liang, P.: Know what you don’t know: unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822 (2018)
Rajpurkar, P., Zhang, J., Lopyrev, K., Liang, P.: SQuad: 100,000+ questions for machine comprehension of text. In: EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings (II), pp. 2383–2392 (2016). https://doi.org/10.18653/v1/d16-1264
Riksrevisjonen: Bilingual English-Norwegian parallel corpus from the Office of the auditor general (Riksrevisjonen) website - ELRC-SHARE (2018). https://www.elrc-share.eu/repository/browse/bilingual-english-norwegian-parallel-corpus-from-the-office-of-the-auditor-general-riksrevisjonen-website/a5d2470201e311e9b7d400155d0267060fffdc9258a741659ce9e52ef15a7c26/
Rogers, A., Kovaleva, O., Rumshisky, A.: A primer in BERTology: what we know about how BERT works. Trans. Assoc. Comput. Linguist. 8, 842–866 (2020)
Article Google Scholar
Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. arXiv preprint arXiv:1905.05950 (2019)
Tenney, I., et al.: What do you learn from context? Probing for sentence structure in contextualized word representations. arXiv preprint arXiv:1905.06316 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 2017-(NIPS), pp. 5999–6009 (2017)
Google Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018)
Wiedemann, G., Remus, S., Chawla, A., Biemann, C.: Does BERT make any sense? Interpretable word sense disambiguation with contextualized embeddings. arXiv preprint arXiv:1909.10430 (2019)
Williams, A., Nangia, N., Bowman, S.R.: A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426 (2017)

Download references

Author information

Authors and Affiliations

Artificial Intelligence Lab (AI Lab), Institutt for informasjonsteknologi, Oslo Metropolitan University, Oslo, Norway
Pernille Aaby, Anis Yazidi, Gustavo Borges Moreno e Mello & Fabrizio Palumbo
Centre for Artificial Intelligence Research (CAIR), Department of ICT, University of Agder, Grimstad, Norway
Daniel Biermann

Authors

Pernille Aaby
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Biermann
View author publications
You can also search for this author in PubMed Google Scholar
Anis Yazidi
View author publications
You can also search for this author in PubMed Google Scholar
Gustavo Borges Moreno e Mello
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Palumbo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabrizio Palumbo .

Editor information

Editors and Affiliations

University of Portsmouth, Portsmouth, UK
Max Bramer
DFKI: German Research Center for Artificial Intelligence, Oldenburg, Germany
Frederic Stahl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aaby, P., Biermann, D., Yazidi, A., Mello, G.B.M.e., Palumbo, F. (2023). Exploring Multilingual Word Embedding Alignments in BERT Models: A Case Study of English and Norwegian. In: Bramer, M., Stahl, F. (eds) Artificial Intelligence XL. SGAI 2023. Lecture Notes in Computer Science(), vol 14381. Springer, Cham. https://doi.org/10.1007/978-3-031-47994-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-47994-6_4
Published: 08 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47993-9
Online ISBN: 978-3-031-47994-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploring Multilingual Word Embedding Alignments in BERT Models: A Case Study of English and Norwegian