Abstract
Deep learning models based on the Transformers architecture have achieved impressive state-of-the-art results and even surpassed human-level performance across various natural language processing tasks. However, these models remain opaque and hard to explain due to their vast complexity and size. This limits adoption in highly-regulated domains like medicine and finance, and often there is a lack of trust from non-expert end-users. In this paper, we show that by teaching a model to generate explanations alongside its predictions on a large annotated dataset, we can transfer this capability to a low-resource task in another domain. Our proposed three-step training procedure improves explanation quality by up to 7% and avoids sacrificing classification performance on the downstream task, while at the same time reducing the need for human annotations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Code available at https://github.com/Peltarion/explainability_transfer.
- 2.
Since all seq2seq models considered in this work have publicly released checkpoints from language model pre-training, this is used as starting point for step 2 in Fig. 1.
- 3.
We use the dataset versions distributed through the ERASER benchmark [10].
- 4.
The hyperparameter settings for the different models and training phases are available in the public code repository.
References
Bastings, J., et al.: Interpretable neural predictions with differentiable binary variables. In: ACL (2019)
Bowman, S.R., et al.: A large annotated corpus for learning natural language inference. In: EMNLP (2015)
Brunner, G., et al.: On identifiability in transformers. In: ICLR (2019)
Camburu, O., et al.: e-SNLI: natural language inference with natural language explanations. In: NeurIPS (2018)
Chen, C., et al.: This looks like that: Deep learning for interpretable image recognition. In: NeurIPS (2019)
Clark, K., et al.: What does BERT look at? An analysis of BERT’S attention. In: ACL Blackbox NLP Workshop (2019)
Common Crawl. https://www.commoncrawl.org
Danilevsky, M., et al.: A survey of the state of explainable AI for natural language processing. In: AACL-IJCNLP (2020)
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
DeYoung, J., et al.: ERASER: a benchmark to evaluate rationalized NLP models. In: ACL (2020)
Doshi-Velez, F., et al.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)
Ehsan, U., et al.: Rationalization: a neural machine translation approach to generating natural language explanations. In: AIES (2018)
EU: General Data Prodection Regulation (GDPR): Recital 71 (2018). https://www.privacy-regulation.eu/en/r71.htm
Guidotti, R., et al.: A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 51(5), 1–42 (2018)
He, K., et al.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: ICCV (2015)
Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 3–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_1
Jacovi, A., et al.: Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness? In: ACL (2020)
Jain, S., et al.: An analysis of attention over clinical notes for predictive tasks. In: Clinical NLP (2019)
Jain, S., et al.: Attention is not explanation. In: NAACL (2019)
Khashabi, D., et al.: Looking beyond the surface: a challenge set for reading comprehension over multiple sentences. In: NAACL (2018)
Kim, B., et al.: The Bayesian case model: a generative approach for case-based reasoning and prototype classification. In: NIPS (2014)
Kim, J., Rohrbach, A., Darrell, T., Canny, J., Akata, Z.: Textual explanations for self-driving vehicles. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 577–593. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_35
Kovaleva, O., et al.: Revealing the dark secrets of BERT. In: NeurIPS (2019)
Lehman, E., et al.: Inferring which medical treatments work from reports of clinical trials. In: NAACL (2019)
Lei, T., et al.: Rationalizing neural predictions. In: EMNLP (2016)
Letham, B., et al.: Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9, 1350–1371 (2015)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020)
Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out (2004)
Lundberg, S., et al.: A unified approach to interpreting model predictions. In: NIPS (2017)
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Mullenbach, J., et al.: Explainable prediction of medical codes from clinical text. In: NAACL (2018)
Narang, S., et al.: WT5?! Training text-to-text models to explain their predictions. arXiv preprint arXiv:2004.14546 (2020)
Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)
Paranjape, B., et al.: An information bottleneck approach for controlling conciseness in rationale extraction. In: EMNLP (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR (2020)
Rajani, N., et al.: Explain yourself! Leveraging language models for commonsense reasoning. In: ACL (2019)
Ribeiro, M., et al.: Why should i trust you? Explaining the predictions of any classifier. In: KDD (2016)
Serrano, S., et al.: Is attention interpretable? In: ACL (2019)
Sundararajan, M.: Axiomatic attribution for deep networks. In: ICML (2017)
Thorne, J., et al.: FEVER: a large-scale dataset for fact extraction and verification. In: NAACL (2018)
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Wadden, D., et al.: Fact or fiction: verifying scientific claims. In: EMNLP (2020)
Wang, A., et al.: Superglue: a stickier benchmark for general-purpose language understanding systems. In: NeurIPS (2019)
Wiegreffe, S., et al.: Attention is not not explanation. In: EMNLP-IJCNLP (2019)
Wiegreffe, S., et al.: Measuring association between labels and free-text rationales. arXiv preprint arXiv:2010.12762 (2020)
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Erliksson, K.F., Arpteg, A., Matskin, M., Payberah, A.H. (2021). Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds) Natural Language Processing and Information Systems. NLDB 2021. Lecture Notes in Computer Science(), vol 12801. Springer, Cham. https://doi.org/10.1007/978-3-030-80599-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-80599-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80598-2
Online ISBN: 978-3-030-80599-9
eBook Packages: Computer ScienceComputer Science (R0)