Skip to main content

Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12801))

Abstract

Deep learning models based on the Transformers architecture have achieved impressive state-of-the-art results and even surpassed human-level performance across various natural language processing tasks. However, these models remain opaque and hard to explain due to their vast complexity and size. This limits adoption in highly-regulated domains like medicine and finance, and often there is a lack of trust from non-expert end-users. In this paper, we show that by teaching a model to generate explanations alongside its predictions on a large annotated dataset, we can transfer this capability to a low-resource task in another domain. Our proposed three-step training procedure improves explanation quality by up to 7% and avoids sacrificing classification performance on the downstream task, while at the same time reducing the need for human annotations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Code available at https://github.com/Peltarion/explainability_transfer.

  2. 2.

    Since all seq2seq models considered in this work have publicly released checkpoints from language model pre-training, this is used as starting point for step 2 in Fig. 1.

  3. 3.

    We use the dataset versions distributed through the ERASER benchmark [10].

  4. 4.

    The hyperparameter settings for the different models and training phases are available in the public code repository.

References

  1. Bastings, J., et al.: Interpretable neural predictions with differentiable binary variables. In: ACL (2019)

    Google Scholar 

  2. Bowman, S.R., et al.: A large annotated corpus for learning natural language inference. In: EMNLP (2015)

    Google Scholar 

  3. Brunner, G., et al.: On identifiability in transformers. In: ICLR (2019)

    Google Scholar 

  4. Camburu, O., et al.: e-SNLI: natural language inference with natural language explanations. In: NeurIPS (2018)

    Google Scholar 

  5. Chen, C., et al.: This looks like that: Deep learning for interpretable image recognition. In: NeurIPS (2019)

    Google Scholar 

  6. Clark, K., et al.: What does BERT look at? An analysis of BERT’S attention. In: ACL Blackbox NLP Workshop (2019)

    Google Scholar 

  7. Common Crawl. https://www.commoncrawl.org

  8. Danilevsky, M., et al.: A survey of the state of explainable AI for natural language processing. In: AACL-IJCNLP (2020)

    Google Scholar 

  9. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)

    Google Scholar 

  10. DeYoung, J., et al.: ERASER: a benchmark to evaluate rationalized NLP models. In: ACL (2020)

    Google Scholar 

  11. Doshi-Velez, F., et al.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

  12. Ehsan, U., et al.: Rationalization: a neural machine translation approach to generating natural language explanations. In: AIES (2018)

    Google Scholar 

  13. EU: General Data Prodection Regulation (GDPR): Recital 71 (2018). https://www.privacy-regulation.eu/en/r71.htm

  14. Guidotti, R., et al.: A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 51(5), 1–42 (2018)

    Article  Google Scholar 

  15. He, K., et al.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: ICCV (2015)

    Google Scholar 

  16. Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 3–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_1

    Chapter  Google Scholar 

  17. Jacovi, A., et al.: Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness? In: ACL (2020)

    Google Scholar 

  18. Jain, S., et al.: An analysis of attention over clinical notes for predictive tasks. In: Clinical NLP (2019)

    Google Scholar 

  19. Jain, S., et al.: Attention is not explanation. In: NAACL (2019)

    Google Scholar 

  20. Khashabi, D., et al.: Looking beyond the surface: a challenge set for reading comprehension over multiple sentences. In: NAACL (2018)

    Google Scholar 

  21. Kim, B., et al.: The Bayesian case model: a generative approach for case-based reasoning and prototype classification. In: NIPS (2014)

    Google Scholar 

  22. Kim, J., Rohrbach, A., Darrell, T., Canny, J., Akata, Z.: Textual explanations for self-driving vehicles. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 577–593. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_35

    Chapter  Google Scholar 

  23. Kovaleva, O., et al.: Revealing the dark secrets of BERT. In: NeurIPS (2019)

    Google Scholar 

  24. Lehman, E., et al.: Inferring which medical treatments work from reports of clinical trials. In: NAACL (2019)

    Google Scholar 

  25. Lei, T., et al.: Rationalizing neural predictions. In: EMNLP (2016)

    Google Scholar 

  26. Letham, B., et al.: Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9, 1350–1371 (2015)

    Article  MathSciNet  Google Scholar 

  27. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020)

    Google Scholar 

  28. Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out (2004)

    Google Scholar 

  29. Lundberg, S., et al.: A unified approach to interpreting model predictions. In: NIPS (2017)

    Google Scholar 

  30. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)

    Article  MathSciNet  Google Scholar 

  31. Mullenbach, J., et al.: Explainable prediction of medical codes from clinical text. In: NAACL (2018)

    Google Scholar 

  32. Narang, S., et al.: WT5?! Training text-to-text models to explain their predictions. arXiv preprint arXiv:2004.14546 (2020)

  33. Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)

    Google Scholar 

  34. Paranjape, B., et al.: An information bottleneck approach for controlling conciseness in rationale extraction. In: EMNLP (2020)

    Google Scholar 

  35. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR (2020)

    Google Scholar 

  36. Rajani, N., et al.: Explain yourself! Leveraging language models for commonsense reasoning. In: ACL (2019)

    Google Scholar 

  37. Ribeiro, M., et al.: Why should i trust you? Explaining the predictions of any classifier. In: KDD (2016)

    Google Scholar 

  38. Serrano, S., et al.: Is attention interpretable? In: ACL (2019)

    Google Scholar 

  39. Sundararajan, M.: Axiomatic attribution for deep networks. In: ICML (2017)

    Google Scholar 

  40. Thorne, J., et al.: FEVER: a large-scale dataset for fact extraction and verification. In: NAACL (2018)

    Google Scholar 

  41. Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)

    Google Scholar 

  42. Wadden, D., et al.: Fact or fiction: verifying scientific claims. In: EMNLP (2020)

    Google Scholar 

  43. Wang, A., et al.: Superglue: a stickier benchmark for general-purpose language understanding systems. In: NeurIPS (2019)

    Google Scholar 

  44. Wiegreffe, S., et al.: Attention is not not explanation. In: EMNLP-IJCNLP (2019)

    Google Scholar 

  45. Wiegreffe, S., et al.: Measuring association between labels and free-text rationales. arXiv preprint arXiv:2010.12762 (2020)

  46. Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karl Fredrik Erliksson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Erliksson, K.F., Arpteg, A., Matskin, M., Payberah, A.H. (2021). Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds) Natural Language Processing and Information Systems. NLDB 2021. Lecture Notes in Computer Science(), vol 12801. Springer, Cham. https://doi.org/10.1007/978-3-030-80599-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-80599-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-80598-2

  • Online ISBN: 978-3-030-80599-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics