Unsupervised Methods for the Study of Transformer Embeddings

Ait Saada, Mira; Role, François; Nadif, Mohamed

doi:10.1007/978-3-030-74251-5_23

Mira Ait Saada^12,13,
François Role¹² &
Mohamed Nadif¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12695))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

886 Accesses
1 Citations

Abstract

Over the last decade neural word embeddings have become a cornerstone of many important text mining applications such as text classification, sentiment analysis, named entity recognition, question answering systems, etc. Particularly, Transformer-based contextual word embeddings have gained much attention with several works trying to understanding how such models work, through the use of supervised probing tasks, and usually emphasizing on BERT. In this paper, we propose a fully unsupervised manner to analyze Transformer-based embedding models in their bare state with no fine-tuning. We more precisely focus on characterizing and identifying groups of Transformer layers across 6 different Transformer models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This could be explained by the parameter sharing technique used to train the ALBERT model, which consists of duplicating the same parameters for all layers [5].

References

van Aken, B., Winter, B., Löser, A., Gers, F.A.: How does BERT answer questions? A layer-wise analysis of transformer representations. In: CIKM, pp. 1823–1832 (2019)
Google Scholar
Clark, K., Khandelwal, U., Levy, O., Manning, C.D.: What does BERT look at? An analysis of Bert’s attention. arXiv preprint arXiv:1906.04341 (2019)
Ethayarajh, K., Duvenaud, D., Hirst, G.: Understanding undesirable word embedding associations. arXiv preprint arXiv:1908.06361 (2019)
Goldberg, Y.: Assessing BERT’s syntactic abilities. arXiv preprint arXiv:1901.05287 (2019)
Hao, Y., Dong, L., Wei, F., Xu, K.: Visualizing and understanding the effectiveness of BERT. arXiv preprint arXiv:1908.05620 (2019)
Jawahar, G., Sagot, B., Seddah, D.: What does bert learn about the structure of language? In: ACL 2019-57th Annual Meeting of the Association for Computational Linguistics (2019)
Google Scholar
Kovaleva, O., Romanov, A., Rogers, A., Rumshisky, A.: Revealing the dark secrets of BERT. arXiv preprint arXiv:1908.08593 (2019)
Liu, N.F., Gardner, M., Belinkov, Y., Peters, M.E., Smith, N.A.: Linguistic knowledge and transferability of contextual representations. arXiv preprint arXiv:1903.08855 (2019)
Peters, M.E., Neumann, M., Zettlemoyer, L., Yih, W.T.: Dissecting contextual word embeddings: architecture and representation. arXiv preprint arXiv:1808.08949 (2018)
Robert, P., Escoufier, Y.: A unifying tool for linear multivariate statistical methods: the RV-coefficient. J. R. Stat. Soc. 25(3), 257–265 (1976)
MathSciNet Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles–a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
MathSciNet MATH Google Scholar
Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. arXiv preprint arXiv:1905.05950 (2019)
Vial, L., Lecouteux, B., Schwab, D.: UFSAC: unification of sense annotated corpora and tools. In: Language Resources and Evaluation Conference (LREC) (2018)
Google Scholar
Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Université de Paris, CNRS, Centre Borelli, 75006, Paris, France
Mira Ait Saada, François Role & Mohamed Nadif
Groupe Caisse des Dépôts, Paris, France
Mira Ait Saada

Authors

Mira Ait Saada
View author publications
You can also search for this author in PubMed Google Scholar
François Role
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nadif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mira Ait Saada .

Editor information

Editors and Affiliations

University of Coimbra, Coimbra, Portugal
Pedro Henriques Abreu
University of Porto, Porto, Portugal
Pedro Pereira Rodrigues
University of Granada, Granada, Spain
Alberto Fernández
University of Porto, Porto, Portugal
João Gama

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ait Saada, M., Role, F., Nadif, M. (2021). Unsupervised Methods for the Study of Transformer Embeddings. In: Abreu, P.H., Rodrigues, P.P., Fernández, A., Gama, J. (eds) Advances in Intelligent Data Analysis XIX. IDA 2021. Lecture Notes in Computer Science(), vol 12695. Springer, Cham. https://doi.org/10.1007/978-3-030-74251-5_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-74251-5_23
Published: 13 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74250-8
Online ISBN: 978-3-030-74251-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics