Abstract
Deep learning models based on the Transformers architecture have achieved impressive state-of-the-art results and even surpassed human-level performance across various natural language processing tasks. However, these models remain opaque and hard to explain due to their vast complexity and size. This limits adoption in highly-regulated domains like medicine and finance, and often there is a lack of trust from non-expert end-users. In this paper, we show that by teaching a model to generate explanations alongside its predictions on a large annotated dataset, we can transfer this capability to a low-resource task in another domain. Our proposed three-step training procedure improves explanation quality by up to 7% and avoids sacrificing classification performance on the downstream task, while at the same time reducing the need for human annotations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Code available at https://github.com/Peltarion/explainability_transfer.
- 2.
Since all seq2seq models considered in this work have publicly released checkpoints from language model pre-training, this is used as starting point for step 2 in Fig. 1.
- 3.
We use the dataset versions distributed through the ERASER benchmark [10].
- 4.
The hyperparameter settings for the different models and training phases are available in the public code repository.
References
Bastings, J., et al.: Interpretable neural predictions with differentiable binary variables. In: ACL (2019)
Bowman, S.R., et al.: A large annotated corpus for learning natural language inference. In: EMNLP (2015)
Brunner, G., et al.: On identifiability in transformers. In: ICLR (2019)
Camburu, O., et al.: e-SNLI: natural language inference with natural language explanations. In: NeurIPS (2018)
Chen, C., et al.: This looks like that: Deep learning for interpretable image recognition. In: NeurIPS (2019)
Clark, K., et al.: What does BERT look at? An analysis of BERT’S attention. In: ACL Blackbox NLP Workshop (2019)
Common Crawl. https://www.commoncrawl.org
Danilevsky, M., et al.: A survey of the state of explainable AI for natural language processing. In: AACL-IJCNLP (2020)
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
DeYoung, J., et al.: ERASER: a benchmark to evaluate rationalized NLP models. In: ACL (2020)
Doshi-Velez, F., et al.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)
Ehsan, U., et al.: Rationalization: a neural machine translation approach to generating natural language explanations. In: AIES (2018)
EU: General Data Prodection Regulation (GDPR): Recital 71 (2018). https://www.privacy-regulation.eu/en/r71.htm
Guidotti, R., et al.: A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 51(5), 1–42 (2018)
He, K., et al.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: ICCV (2015)
Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 3–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_1
Jacovi, A., et al.: Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness? In: ACL (2020)
Jain, S., et al.: An analysis of attention over clinical notes for predictive tasks. In: Clinical NLP (2019)
Jain, S., et al.: Attention is not explanation. In: NAACL (2019)
Khashabi, D., et al.: Looking beyond the surface: a challenge set for reading comprehension over multiple sentences. In: NAACL (2018)
Kim, B., et al.: The Bayesian case model: a generative approach for case-based reasoning and prototype classification. In: NIPS (2014)
Kim, J., Rohrbach, A., Darrell, T., Canny, J., Akata, Z.: Textual explanations for self-driving vehicles. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 577–593. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_35
Kovaleva, O., et al.: Revealing the dark secrets of BERT. In: NeurIPS (2019)
Lehman, E., et al.: Inferring which medical treatments work from reports of clinical trials. In: NAACL (2019)
Lei, T., et al.: Rationalizing neural predictions. In: EMNLP (2016)
Letham, B., et al.: Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9, 1350–1371 (2015)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020)
Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out (2004)
Lundberg, S., et al.: A unified approach to interpreting model predictions. In: NIPS (2017)
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Mullenbach, J., et al.: Explainable prediction of medical codes from clinical text. In: NAACL (2018)
Narang, S., et al.: WT5?! Training text-to-text models to explain their predictions. arXiv preprint arXiv:2004.14546 (2020)
Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)
Paranjape, B., et al.: An information bottleneck approach for controlling conciseness in rationale extraction. In: EMNLP (2020)
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR (2020)
Rajani, N., et al.: Explain yourself! Leveraging language models for commonsense reasoning. In: ACL (2019)
Ribeiro, M., et al.: Why should i trust you? Explaining the predictions of any classifier. In: KDD (2016)
Serrano, S., et al.: Is attention interpretable? In: ACL (2019)
Sundararajan, M.: Axiomatic attribution for deep networks. In: ICML (2017)
Thorne, J., et al.: FEVER: a large-scale dataset for fact extraction and verification. In: NAACL (2018)
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Wadden, D., et al.: Fact or fiction: verifying scientific claims. In: EMNLP (2020)
Wang, A., et al.: Superglue: a stickier benchmark for general-purpose language understanding systems. In: NeurIPS (2019)
Wiegreffe, S., et al.: Attention is not not explanation. In: EMNLP-IJCNLP (2019)
Wiegreffe, S., et al.: Measuring association between labels and free-text rationales. arXiv preprint arXiv:2010.12762 (2020)
Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Erliksson, K.F., Arpteg, A., Matskin, M., Payberah, A.H. (2021). Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds) Natural Language Processing and Information Systems. NLDB 2021. Lecture Notes in Computer Science(), vol 12801. Springer, Cham. https://doi.org/10.1007/978-3-030-80599-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-80599-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80598-2
Online ISBN: 978-3-030-80599-9
eBook Packages: Computer ScienceComputer Science (R0)