Skip to main content

Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models

  • Conference paper
  • First Online:
  • 1768 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12801))

Abstract

Deep learning models based on the Transformers architecture have achieved impressive state-of-the-art results and even surpassed human-level performance across various natural language processing tasks. However, these models remain opaque and hard to explain due to their vast complexity and size. This limits adoption in highly-regulated domains like medicine and finance, and often there is a lack of trust from non-expert end-users. In this paper, we show that by teaching a model to generate explanations alongside its predictions on a large annotated dataset, we can transfer this capability to a low-resource task in another domain. Our proposed three-step training procedure improves explanation quality by up to 7% and avoids sacrificing classification performance on the downstream task, while at the same time reducing the need for human annotations.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Code available at https://github.com/Peltarion/explainability_transfer.

  2. 2.

    Since all seq2seq models considered in this work have publicly released checkpoints from language model pre-training, this is used as starting point for step 2 in Fig. 1.

  3. 3.

    We use the dataset versions distributed through the ERASER benchmark [10].

  4. 4.

    The hyperparameter settings for the different models and training phases are available in the public code repository.

References

  1. Bastings, J., et al.: Interpretable neural predictions with differentiable binary variables. In: ACL (2019)

    Google Scholar 

  2. Bowman, S.R., et al.: A large annotated corpus for learning natural language inference. In: EMNLP (2015)

    Google Scholar 

  3. Brunner, G., et al.: On identifiability in transformers. In: ICLR (2019)

    Google Scholar 

  4. Camburu, O., et al.: e-SNLI: natural language inference with natural language explanations. In: NeurIPS (2018)

    Google Scholar 

  5. Chen, C., et al.: This looks like that: Deep learning for interpretable image recognition. In: NeurIPS (2019)

    Google Scholar 

  6. Clark, K., et al.: What does BERT look at? An analysis of BERT’S attention. In: ACL Blackbox NLP Workshop (2019)

    Google Scholar 

  7. Common Crawl. https://www.commoncrawl.org

  8. Danilevsky, M., et al.: A survey of the state of explainable AI for natural language processing. In: AACL-IJCNLP (2020)

    Google Scholar 

  9. Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)

    Google Scholar 

  10. DeYoung, J., et al.: ERASER: a benchmark to evaluate rationalized NLP models. In: ACL (2020)

    Google Scholar 

  11. Doshi-Velez, F., et al.: Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017)

  12. Ehsan, U., et al.: Rationalization: a neural machine translation approach to generating natural language explanations. In: AIES (2018)

    Google Scholar 

  13. EU: General Data Prodection Regulation (GDPR): Recital 71 (2018). https://www.privacy-regulation.eu/en/r71.htm

  14. Guidotti, R., et al.: A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 51(5), 1–42 (2018)

    Article  Google Scholar 

  15. He, K., et al.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: ICCV (2015)

    Google Scholar 

  16. Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 3–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_1

    Chapter  Google Scholar 

  17. Jacovi, A., et al.: Towards faithfully interpretable NLP systems: how should we define and evaluate faithfulness? In: ACL (2020)

    Google Scholar 

  18. Jain, S., et al.: An analysis of attention over clinical notes for predictive tasks. In: Clinical NLP (2019)

    Google Scholar 

  19. Jain, S., et al.: Attention is not explanation. In: NAACL (2019)

    Google Scholar 

  20. Khashabi, D., et al.: Looking beyond the surface: a challenge set for reading comprehension over multiple sentences. In: NAACL (2018)

    Google Scholar 

  21. Kim, B., et al.: The Bayesian case model: a generative approach for case-based reasoning and prototype classification. In: NIPS (2014)

    Google Scholar 

  22. Kim, J., Rohrbach, A., Darrell, T., Canny, J., Akata, Z.: Textual explanations for self-driving vehicles. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 577–593. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_35

    Chapter  Google Scholar 

  23. Kovaleva, O., et al.: Revealing the dark secrets of BERT. In: NeurIPS (2019)

    Google Scholar 

  24. Lehman, E., et al.: Inferring which medical treatments work from reports of clinical trials. In: NAACL (2019)

    Google Scholar 

  25. Lei, T., et al.: Rationalizing neural predictions. In: EMNLP (2016)

    Google Scholar 

  26. Letham, B., et al.: Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9, 1350–1371 (2015)

    Article  MathSciNet  Google Scholar 

  27. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020)

    Google Scholar 

  28. Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out (2004)

    Google Scholar 

  29. Lundberg, S., et al.: A unified approach to interpreting model predictions. In: NIPS (2017)

    Google Scholar 

  30. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)

    Article  MathSciNet  Google Scholar 

  31. Mullenbach, J., et al.: Explainable prediction of medical codes from clinical text. In: NAACL (2018)

    Google Scholar 

  32. Narang, S., et al.: WT5?! Training text-to-text models to explain their predictions. arXiv preprint arXiv:2004.14546 (2020)

  33. Papineni, K., et al.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)

    Google Scholar 

  34. Paranjape, B., et al.: An information bottleneck approach for controlling conciseness in rationale extraction. In: EMNLP (2020)

    Google Scholar 

  35. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR (2020)

    Google Scholar 

  36. Rajani, N., et al.: Explain yourself! Leveraging language models for commonsense reasoning. In: ACL (2019)

    Google Scholar 

  37. Ribeiro, M., et al.: Why should i trust you? Explaining the predictions of any classifier. In: KDD (2016)

    Google Scholar 

  38. Serrano, S., et al.: Is attention interpretable? In: ACL (2019)

    Google Scholar 

  39. Sundararajan, M.: Axiomatic attribution for deep networks. In: ICML (2017)

    Google Scholar 

  40. Thorne, J., et al.: FEVER: a large-scale dataset for fact extraction and verification. In: NAACL (2018)

    Google Scholar 

  41. Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)

    Google Scholar 

  42. Wadden, D., et al.: Fact or fiction: verifying scientific claims. In: EMNLP (2020)

    Google Scholar 

  43. Wang, A., et al.: Superglue: a stickier benchmark for general-purpose language understanding systems. In: NeurIPS (2019)

    Google Scholar 

  44. Wiegreffe, S., et al.: Attention is not not explanation. In: EMNLP-IJCNLP (2019)

    Google Scholar 

  45. Wiegreffe, S., et al.: Measuring association between labels and free-text rationales. arXiv preprint arXiv:2010.12762 (2020)

  46. Xue, L., et al.: mT5: a massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karl Fredrik Erliksson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Erliksson, K.F., Arpteg, A., Matskin, M., Payberah, A.H. (2021). Cross-Domain Transfer of Generative Explanations Using Text-to-Text Models. In: Métais, E., Meziane, F., Horacek, H., Kapetanios, E. (eds) Natural Language Processing and Information Systems. NLDB 2021. Lecture Notes in Computer Science(), vol 12801. Springer, Cham. https://doi.org/10.1007/978-3-030-80599-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-80599-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-80598-2

  • Online ISBN: 978-3-030-80599-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics