skip to main content
10.1145/3366424.3383542acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

VisBERT: Hidden-State Visualizations for Transformers

Published: 20 April 2020 Publication History

Abstract

Explainability and interpretability are two important concepts, the absence of which can and should impede the application of well-performing neural networks to real-world problems. At the same time, they are difficult to incorporate into the large, black-box models that achieve state-of-the-art results in a multitude of NLP tasks. Bidirectional Encoder Representations from Transformers (BERT) is one such black-box model. It has become a staple architecture to solve many different NLP tasks and has inspired a number of related Transformer models. Understanding how these models draw conclusions is crucial for both their improvement and application. We contribute to this challenge by presenting VisBERT, a tool for visualizing the contextual token representations within BERT for the task of (multi-hop) Question Answering. Instead of analyzing attention weights, we focus on the hidden states resulting from each encoder block within the BERT model. This way we can observe how the semantic representations are transformed throughout the layers of the model. VisBERT enables users to get insights about the model’s internal state and to explore its inference steps or potential shortcomings. The tool allows us to identify distinct phases in BERT’s transformations that are similar to a traditional NLP pipeline and offer insights during failed predictions.

References

[1]
Pierre Comon. 1994. Independent component analysis, A new concept?Signal Processing 36(1994).
[2]
Karl Pearson F.R.S.1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2(1901).
[3]
Sarthak Jain and Byron C. Wallace. 2019. Attention is not Explanation. In NAACL ’19.
[4]
Robin Jia and Percy Liang. 2017. Adversarial Examples for Evaluating Reading Comprehension Systems. EMNLP ’17 (2017).
[5]
Stuart P. Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Information Theory(1982).
[6]
L. McInnes, J. Healy, and J. Melville. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints (2018). arXiv:1802.03426
[7]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR ’13 Workshop Track.
[8]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.
[9]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In EMNLP ’16.
[10]
Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In ACL ’19.
[11]
Betty van Aken, Benjamin Winter, Alexander Löser, and Felix A Gers. 2019. How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations. In CIKM ’19.
[12]
Laurens van der Maaten. 2009. Learning a Parametric Embedding by Preserving Local Structure. In AISTATS ’09.
[13]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In NIPS ’17.
[14]
Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. ACL ’19 System Demonstrations(2019).
[15]
Jason Weston, Antoine Bordes, Sumit Chopra, and Tomas Mikolov. 2016. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In ICLR ’16.
[16]
Qile Zhu Xiaolin Li Xiaoyong Yuan, Pan He. 2017. Adversarial Examples: Attacks and Defenses for Deep Learning. arXiv preprint arXiv:1712.07107(2017).
[17]
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In EMNLP ’18.

Cited By

View all
  • (2025)Decoding Fake News and Hate Speech: A Survey of Explainable AI TechniquesACM Computing Surveys10.1145/371112357:7(1-37)Online publication date: 20-Feb-2025
  • (2025)The More Polypersonal the Better - A Short Look on Space Geometry of Fine-Tuned LayersAdvances in Neural Computation, Machine Learning, and Cognitive Research VIII10.1007/978-3-031-80463-2_2(13-22)Online publication date: 1-Mar-2025
  • (2024)The Explainability of Transformers: Current Status and DirectionsComputers10.3390/computers1304009213:4(92)Online publication date: 4-Apr-2024
  • Show More Cited By

Index Terms

  1. VisBERT: Hidden-State Visualizations for Transformers
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '20: Companion Proceedings of the Web Conference 2020
          April 2020
          854 pages
          ISBN:9781450370240
          DOI:10.1145/3366424
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 April 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '20
          Sponsor:
          WWW '20: The Web Conference 2020
          April 20 - 24, 2020
          Taipei, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)72
          • Downloads (Last 6 weeks)4
          Reflects downloads up to 02 Mar 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2025)Decoding Fake News and Hate Speech: A Survey of Explainable AI TechniquesACM Computing Surveys10.1145/371112357:7(1-37)Online publication date: 20-Feb-2025
          • (2025)The More Polypersonal the Better - A Short Look on Space Geometry of Fine-Tuned LayersAdvances in Neural Computation, Machine Learning, and Cognitive Research VIII10.1007/978-3-031-80463-2_2(13-22)Online publication date: 1-Mar-2025
          • (2024)The Explainability of Transformers: Current Status and DirectionsComputers10.3390/computers1304009213:4(92)Online publication date: 4-Apr-2024
          • (2024)GPT Attack for Adversarial Privacy Policies2024 10th International Conference on Big Data Computing and Communications (BigCom)10.1109/BIGCOM65357.2024.00032(173-180)Online publication date: 9-Aug-2024
          • (2024)A Survey of Text Classification With Transformers: How Wide? How Large? How Long? How Accurate? How Expensive? How Safe?IEEE Access10.1109/ACCESS.2024.334995212(6518-6531)Online publication date: 2024
          • (2024)BHPVAS: visual analysis system for pruning attention heads in BERT modelJournal of Visualization10.1007/s12650-024-00985-z27:4(731-748)Online publication date: 12-Apr-2024
          • (2024)The More Polypersonal the Better - A Short Look on Space Geometry of Fine-Tuned LayersAdvances in Neural Computation, Machine Learning, and Cognitive Research VIII10.1007/978-3-031-73691-9_2(13-22)Online publication date: 20-Oct-2024
          • (2023)Multi-Task Transformer Visualization to build Trust for Clinical Outcome Prediction2023 Workshop on Visual Analytics in Healthcare (VAHC)10.1109/VAHC60858.2023.00010(21-26)Online publication date: 22-Oct-2023
          • (2022)Hateful Memes Detection Based on Multi-Task LearningMathematics10.3390/math1023452510:23(4525)Online publication date: 30-Nov-2022
          • (2022)Transformer-Based Music Language Modelling and TranscriptionProceedings of the 12th Hellenic Conference on Artificial Intelligence10.1145/3549737.3549754(1-8)Online publication date: 7-Sep-2022
          • Show More Cited By

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media