research-article

VisBERT: Hidden-State Visualizations for Transformers

Authors:

Betty van Aken,

Benjamin Winter,

Alexander Löser,

Felix A. GersAuthors Info & Claims

WWW '20: Companion Proceedings of the Web Conference 2020

Pages 207 - 211

https://doi.org/10.1145/3366424.3383542

Published: 20 April 2020 Publication History

Abstract

Explainability and interpretability are two important concepts, the absence of which can and should impede the application of well-performing neural networks to real-world problems. At the same time, they are difficult to incorporate into the large, black-box models that achieve state-of-the-art results in a multitude of NLP tasks. Bidirectional Encoder Representations from Transformers (BERT) is one such black-box model. It has become a staple architecture to solve many different NLP tasks and has inspired a number of related Transformer models. Understanding how these models draw conclusions is crucial for both their improvement and application. We contribute to this challenge by presenting VisBERT, a tool for visualizing the contextual token representations within BERT for the task of (multi-hop) Question Answering. Instead of analyzing attention weights, we focus on the hidden states resulting from each encoder block within the BERT model. This way we can observe how the semantic representations are transformed throughout the layers of the model. VisBERT enables users to get insights about the model’s internal state and to explore its inference steps or potential shortcomings. The tool allows us to identify distinct phases in BERT’s transformations that are similar to a traditional NLP pipeline and offer insights during failed predictions.

References

[1]

Pierre Comon. 1994. Independent component analysis, A new concept?Signal Processing 36(1994).

[2]

Karl Pearson F.R.S.1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2(1901).

[3]

Sarthak Jain and Byron C. Wallace. 2019. Attention is not Explanation. In NAACL ’19.

[4]

Robin Jia and Percy Liang. 2017. Adversarial Examples for Evaluating Reading Comprehension Systems. EMNLP ’17 (2017).

[5]

Stuart P. Lloyd. 1982. Least squares quantization in PCM. IEEE Trans. Information Theory(1982).

Digital Library

[6]

L. McInnes, J. Healy, and J. Melville. 2018. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints (2018). arXiv:1802.03426

[7]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR ’13 Workshop Track.

[8]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners.

[9]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In EMNLP ’16.

[10]

Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. In ACL ’19.

[11]

Betty van Aken, Benjamin Winter, Alexander Löser, and Felix A Gers. 2019. How Does BERT Answer Questions?: A Layer-Wise Analysis of Transformer Representations. In CIKM ’19.

[12]

Laurens van der Maaten. 2009. Learning a Parametric Embedding by Preserving Local Structure. In AISTATS ’09.

[13]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need. In NIPS ’17.

[14]

Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. ACL ’19 System Demonstrations(2019).

[15]

Jason Weston, Antoine Bordes, Sumit Chopra, and Tomas Mikolov. 2016. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In ICLR ’16.

[16]

Qile Zhu Xiaolin Li Xiaoyong Yuan, Pan He. 2017. Adversarial Examples: Attacks and Defenses for Deep Learning. arXiv preprint arXiv:1712.07107(2017).

[17]

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In EMNLP ’18.

Cited By

Ngueajio MAryal SAtemkeng MWashington GRawat D(2025)Decoding Fake News and Hate Speech: A Survey of Explainable AI TechniquesACM Computing Surveys10.1145/371112357:7(1-37)Online publication date: 20-Feb-2025
https://dl.acm.org/doi/10.1145/3711123
Kudriashov SZykova VStepanova ARaskind JKlyshinsky E(2025)The More Polypersonal the Better - A Short Look on Space Geometry of Fine-Tuned LayersAdvances in Neural Computation, Machine Learning, and Cognitive Research VIII10.1007/978-3-031-80463-2_2(13-22)Online publication date: 1-Mar-2025
https://doi.org/10.1007/978-3-031-80463-2_2
Fantozzi PNaldi M(2024)The Explainability of Transformers: Current Status and DirectionsComputers10.3390/computers1304009213:4(92)Online publication date: 4-Apr-2024
https://doi.org/10.3390/computers13040092
Show More Cited By

Index Terms

VisBERT: Hidden-State Visualizations for Transformers
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Neural networks
2. Human-centered computing
  1. Human computer interaction (HCI)

Index terms have been assigned to the content through auto-classification.

Recommendations

State coverage: a structural test adequacy criterion for behavior checking
ESEC-FSE '07: Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering

We propose a new language-independent, structural test adequacy criterion called state coverage. State coverage measures whether unit-level tests check the outputs and sideeffects of a program.

State coverage differs in several respects from existing ...
State coverage: a structural test adequacy criterion for behavior checking
ESEC-FSE companion '07: The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: companion papers

We propose a new language-independent, structural test adequacy criterion called state coverage. State coverage measures whether unit-level tests check the outputs and side effects of a program.

State coverage differs in several respects from existing ...
State coverage: software validation metrics beyond code coverage
SOFSEM'12: Proceedings of the 38th international conference on Current Trends in Theory and Practice of Computer Science

Currently, testing is still the most important approach to reduce the amount of software defects. Software quality metrics help to prioritize where additional testing is necessary by measuring the quality of the code. Most approaches to estimate whether ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Companion Proceedings of the Web Conference 2020

April 2020

854 pages

ISBN:9781450370240

DOI:10.1145/3366424

Editors:
Amal El Fallah Seghrouchni
Sorbonne University, France
,
Gita Sukthankar
University of Central Florida, United States
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
666
Total Downloads

Downloads (Last 12 months)72
Downloads (Last 6 weeks)4

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ngueajio MAryal SAtemkeng MWashington GRawat D(2025)Decoding Fake News and Hate Speech: A Survey of Explainable AI TechniquesACM Computing Surveys10.1145/371112357:7(1-37)Online publication date: 20-Feb-2025
https://dl.acm.org/doi/10.1145/3711123
Kudriashov SZykova VStepanova ARaskind JKlyshinsky E(2025)The More Polypersonal the Better - A Short Look on Space Geometry of Fine-Tuned LayersAdvances in Neural Computation, Machine Learning, and Cognitive Research VIII10.1007/978-3-031-80463-2_2(13-22)Online publication date: 1-Mar-2025
https://doi.org/10.1007/978-3-031-80463-2_2
Fantozzi PNaldi M(2024)The Explainability of Transformers: Current Status and DirectionsComputers10.3390/computers1304009213:4(92)Online publication date: 4-Apr-2024
https://doi.org/10.3390/computers13040092
Joy IWu JHe J(2024)GPT Attack for Adversarial Privacy Policies2024 10th International Conference on Big Data Computing and Communications (BigCom)10.1109/BIGCOM65357.2024.00032(173-180)Online publication date: 9-Aug-2024
https://doi.org/10.1109/BIGCOM65357.2024.00032
Fields JChovanec KMadiraju P(2024)A Survey of Text Classification With Transformers: How Wide? How Large? How Long? How Accurate? How Expensive? How Safe?IEEE Access10.1109/ACCESS.2024.334995212(6518-6531)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3349952
Liu ZSun HSun HHong XXu GWu X(2024)BHPVAS: visual analysis system for pruning attention heads in BERT modelJournal of Visualization10.1007/s12650-024-00985-z27:4(731-748)Online publication date: 12-Apr-2024
https://doi.org/10.1007/s12650-024-00985-z
Kudriashov SZykova VStepanova ARaskind JKlyshinsky E(2024)The More Polypersonal the Better - A Short Look on Space Geometry of Fine-Tuned LayersAdvances in Neural Computation, Machine Learning, and Cognitive Research VIII10.1007/978-3-031-73691-9_2(13-22)Online publication date: 20-Oct-2024
https://doi.org/10.1007/978-3-031-73691-9_2
Antweiler DGallusser FFuchs G(2023)Multi-Task Transformer Visualization to build Trust for Clinical Outcome Prediction2023 Workshop on Visual Analytics in Healthcare (VAHC)10.1109/VAHC60858.2023.00010(21-26)Online publication date: 22-Oct-2023
https://doi.org/10.1109/VAHC60858.2023.00010
Ma ZYao SWu LGao SZhang Y(2022)Hateful Memes Detection Based on Multi-Task LearningMathematics10.3390/math1023452510:23(4525)Online publication date: 30-Nov-2022
https://doi.org/10.3390/math10234525
Zonios CPavlopoulos JLikas A(2022)Transformer-Based Music Language Modelling and TranscriptionProceedings of the 12th Hellenic Conference on Artificial Intelligence10.1145/3549737.3549754(1-8)Online publication date: 7-Sep-2022
https://dl.acm.org/doi/10.1145/3549737.3549754
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten