Evaluating Attribution Methods for Explainable NLP with Transformers

Bartička, Vojtěch; Pražák, Ondřej; Konopík, Miloslav; Sido, Jakub

doi:10.1007/978-3-031-16270-1_1

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13502))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1320 Accesses

Abstract

This paper describes the experimental evaluation of several attribution methods on two NLP tasks: Sentiment analysis and multi-label document classification. Our motivation is to find the best method to use with Transformers to interpret model decisions. For this purpose, we introduce two new evaluation datasets. The first one is derived from Stanford Sentiment Treebank, where the sentiment of individual words is annotated along with the sentiment of the whole sentence. The second dataset comes from Czech Text Document Corpus, where we added keyword information assigned to each category. The keywords were manually assigned to each document and automatically propagated to categories via PMI. We evaluate each attribution method on several models of different sizes. The evaluation results are reasonably consistent across all models and both datasets. It indicates that both datasets with proposed evaluation metrics are suitable for interpretability evaluation. We show how the attribution methods behave concerning model size and task. We also consider practical applications – we show that while some methods perform well, they can be replaced with slightly worse-performing methods requiring significantly less time to compute.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Trusting deep learning natural-language models via local and global explanations

Article Open access 22 June 2022

Deep NLP Explainer: Using Prediction Slope to Explain NLP Models

Neural Networks with Feature Attribution and Contrastive Explanations

Notes

1.
Different activation, slightly different layer operation.
2.
https://github.com/aitakaitov/tsd-2022-attributions.

References

Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)
Article Google Scholar
Bhargava, P., Drozd, A., Rogers, A.: Generalization in nli: Ways (not) to go beyond simple heuristics. In: Proceedings of the Second Workshop on Insights from Negative Results in NLP, pp. 125–135 (2021)
Google Scholar
Chefer, H., Gur, S., Wolf, L.: Transformer interpretability beyond attention visualization, pp. 782–791 (2021)
Google Scholar
Denil, M., Demiraj, A., Kalchbrenner, N., Blunsom, P., de Freitas, N.: Modelling, visualising and summarising documents with a single convolutional neural network. arXiv preprint arXiv:1406.3830 (2014)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
DeYoung, J., Jain, S., Rajani, N.F., Lehman, E., Xiong, C., Socher, R., Wallace, B.C.: Eraser: a benchmark to evaluate rationalized nlp models. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4443–4458 (2020)
Google Scholar
Kocián, M., Náplava, J., Štancl, D., Kadlec, V.: Siamese bert-based model for web search relevance ranking evaluated on a new czech dataset. arXiv preprint arXiv:2112.01810 (2021)
Kral, P., Lenc, L.: Czech text document corpus v 2.0. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Paris, France, May 2018
Google Scholar
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: International Conference on Machine Learning, pp. 3145–3153. PMLR (2017)
Google Scholar
Shrikumar, A., Greenside, P., Shcherbina, A., Kundaje, A.: Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713 (2016)
Sido, J., Pražák, O., Přibáň, P., Pašek, J., Seják, M., Konopík, M.: Czert-czech bert-like model for language representation. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pp. 1326–1338 (2021)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
Google Scholar
Sofaer, H.R., Hoeting, J.A., Jarnevich, C.S.: The area under the precision-recall curve as a performance metric for rare binary events. Methods Ecol. Evol. 10(4), 565–577 (2019)
Article Google Scholar
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks, pp. 3319–3328 (2017)
Google Scholar
Turc, I., Chang, M.W., Lee, K., Toutanova, K.: Well-read students learn better: On the importance of pre-training compact models. arXiv preprint arXiv:1908.08962 (2019)
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: Glue: a multi-task benchmark and analysis platform for natural language understanding. In: 7th International Conference on Learning Representations, ICLR 2019 (2019)
Google Scholar

Download references

Acknowledgements

This work has been supported by the Technology Agency of the Czech Republic within the ETA Programme - No. TL03000152 “Artificial Intelligence, Media, and Law”, and by Grant No. SGS-2022-016 Advanced methods of data processing and analysis. Computational resources were supplied by the project “e-Infrastruktura CZ” (e-INFRA CZ LM2018140) supported by the Ministry of Education, Youth and Sports of the Czech Republic.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, NTIS – New Technologies for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Plzeň, Czech Republic
Vojtěch Bartička, Ondřej Pražák, Miloslav Konopík & Jakub Sido

Authors

Vojtěch Bartička
View author publications
You can also search for this author in PubMed Google Scholar
Ondřej Pražák
View author publications
You can also search for this author in PubMed Google Scholar
Miloslav Konopík
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Sido
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ondřej Pražák .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bartička, V., Pražák, O., Konopík, M., Sido, J. (2022). Evaluating Attribution Methods for Explainable NLP with Transformers. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2022. Lecture Notes in Computer Science(), vol 13502. Springer, Cham. https://doi.org/10.1007/978-3-031-16270-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-16270-1_1
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16269-5
Online ISBN: 978-3-031-16270-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Attribution Methods for Explainable NLP with Transformers