Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models

Erliksson, Karl Fredrik; Arpteg, Anders; Matskin, Mihhail; Payberah, Amir H.

doi:10.1007/978-3-030-80599-9_8

Karl Fredrik Erliksson^12,13,
Anders Arpteg¹³,
Mihhail Matskin¹² &
…
Amir H. Payberah¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12801))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

2081 Accesses
2 Citations

Abstract

Deep learning models based on the Transformers architecture have achieved impressive state-of-the-art results and even surpassed human-level performance across various natural language processing tasks. However, these models remain opaque and hard to explain due to their vast complexity and size. This limits adoption in highly-regulated domains like medicine and finance, and often there is a lack of trust from non-expert end-users. In this paper, we show that by teaching a model to generate explanations alongside its predictions on a large annotated dataset, we can transfer this capability to a low-resource task in another domain. Our proposed three-step training procedure improves explanation quality by up to 7% and avoids sacrificing classification performance on the downstream task, while at the same time reducing the need for human annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Trusting deep learning natural-language models via local and global explanations

Article Open access 22 June 2022

Towards Complementary Explanations Using Deep Neural Networks

Phenomena Explanation from Text: Unsupervised Learning of Interpretable and Statistically Significant Knowledge

Notes

1.
Code available at https://github.com/Peltarion/explainability_transfer.
2.
Since all seq2seq models considered in this work have publicly released checkpoints from language model pre-training, this is used as starting point for step 2 in Fig. 1.
3.
We use the dataset versions distributed through the ERASER benchmark [10].
4.
The hyperparameter settings for the different models and training phases are available in the public code repository.

References

Bastings, J., et al.: Interpretable neural predictions with differentiable binary variables. In: ACL (2019)
Google Scholar
Bowman, S.R., et al.: A large annotated corpus for learning natural language inference. In: EMNLP (2015)
Google Scholar
Brunner, G., et al.: On identifiability in transformers. In: ICLR (2019)
Google Scholar
Camburu, O., et al.: e-SNLI: natural language inference with natural language explanations. In: NeurIPS (2018)
Google Scholar
Chen, C., et al.: This looks like that: Deep learning for interpretable image recognition. In: NeurIPS (2019)
Google Scholar
Clark, K., et al.: What does BERT look at? An analysis of BERT’S attention. In: ACL Blackbox NLP Workshop (2019)
Google Scholar
Common Crawl. https://www.commoncrawl.org
Danilevsky, M., et al.: A survey of the state of explainable AI for natural language processing. In: AACL-IJCNLP (2020)
Google Scholar
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Google Scholar
DeYoung, J., et al.: ERASER: a benchmark to evaluate rationalized NLP models. In: ACL (2020)
Google Scholar
Doshi-Velez, F., et al.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)
Ehsan, U., et al.: Rationalization: a neural machine translation approach to generating natural language explanations. In: AIES (2018)
Google Scholar
EU: General Data Prodection Regulation (GDPR): Recital 71 (2018). https://www.privacy-regulation.eu/en/r71.htm
Guidotti, R., et al.: A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 51(5), 1–42 (2018)
Article Google Scholar
He, K., et al.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: ICCV (2015)
Google Scholar
Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 3–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_1
Chapter Google Scholar
Jacovi, A., et al.: Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness? In: ACL (2020)
Google Scholar
Jain, S., et al.: An analysis of attention over clinical notes for predictive tasks. In: Clinical NLP (2019)
Google Scholar
Jain, S., et al.: Attention is not explanation. In: NAACL (2019)
Google Scholar
Khashabi, D., et al.: Looking beyond the surface: a challenge set for reading comprehension over multiple sentences. In: NAACL (2018)
Google Scholar
Kim, B., et al.: The Bayesian case model: a generative approach for case-based reasoning and prototype classification. In: NIPS (2014)
Google Scholar
Kim, J., Rohrbach, A., Darrell, T., Canny, J., Akata, Z.: Textual explanations for self-driving vehicles. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 577–593. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_35
Chapter Google Scholar
Kovaleva, O., et al.: Revealing the dark secrets of BERT. In: NeurIPS (2019)
Google Scholar
Lehman, E., et al.: Inferring which medical treatments work from reports of clinical trials. In: NAACL (2019)
Google Scholar
Lei, T., et al.: Rationalizing neural predictions. In: EMNLP (2016)
Google Scholar
Letham, B., et al.: Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9, 1350–1371 (2015)
Article MathSciNet Google Scholar
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020)
Google Scholar
Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out (2004)
Google Scholar
Lundberg, S., et al.: A unified approach to interpreting model predictions. In: NIPS (2017)
Google Scholar
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Article MathSciNet Google Scholar
Mullenbach, J., et al.: Explainable prediction of medical codes from clinical text. In: NAACL (2018)
Google Scholar
Narang, S., et al.: WT5?! Training text-to-text models to explain their predictions. arXiv preprint arXiv:2004.14546 (2020)
Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)
Google Scholar
Paranjape, B., et al.: An information bottleneck approach for controlling conciseness in rationale extraction. In: EMNLP (2020)
Google Scholar
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR (2020)
Google Scholar
Rajani, N., et al.: Explain yourself! Leveraging language models for commonsense reasoning. In: ACL (2019)
Google Scholar
Ribeiro, M., et al.: Why should i trust you? Explaining the predictions of any classifier. In: KDD (2016)
Google Scholar
Serrano, S., et al.: Is attention interpretable? In: ACL (2019)
Google Scholar
Sundararajan, M.: Axiomatic attribution for deep networks. In: ICML (2017)
Google Scholar
Thorne, J., et al.: FEVER: a large-scale dataset for fact extraction and verification. In: NAACL (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Wadden, D., et al.: Fact or fiction: verifying scientific claims. In: EMNLP (2020)
Google Scholar
Wang, A., et al.: Superglue: a stickier benchmark for general-purpose language understanding systems. In: NeurIPS (2019)
Google Scholar
Wiegreffe, S., et al.: Attention is not not explanation. In: EMNLP-IJCNLP (2019)
Google Scholar
Wiegreffe, S., et al.: Measuring association between labels and free-text rationales. arXiv preprint arXiv:2010.12762 (2020)
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020)

Download references

Author information

Authors and Affiliations

KTH Royal Institute of Technology, Stockholm, Sweden
Karl Fredrik Erliksson, Mihhail Matskin & Amir H. Payberah
Peltarion, Stockholm, Sweden
Karl Fredrik Erliksson & Anders Arpteg

Authors

Karl Fredrik Erliksson
View author publications
You can also search for this author in PubMed Google Scholar
Anders Arpteg
View author publications
You can also search for this author in PubMed Google Scholar
Mihhail Matskin
View author publications
You can also search for this author in PubMed Google Scholar
Amir H. Payberah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karl Fredrik Erliksson .

Editor information

Editors and Affiliations

Conservatoire National des Arts et Métiers, Paris, France
Elisabeth Métais
University of Derby, Derby, UK
Farid Meziane
German Research Center for Artificial Intelligence, Saarbrücken, Germany
Helmut Horacek
University of Hertfordshire, Hatfield, UK
Epaminondas Kapetanios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Erliksson, K.F., Arpteg, A., Matskin, M., Payberah, A.H. (2021). Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds) Natural Language Processing and Information Systems. NLDB 2021. Lecture Notes in Computer Science(), vol 12801. Springer, Cham. https://doi.org/10.1007/978-3-030-80599-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-80599-9_8
Published: 20 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80598-2
Online ISBN: 978-3-030-80599-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics