Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Multiple stakeholders drive diverse interpretability requirements for machine learning in healthcare

Abstract

Applications of machine learning are becoming increasingly common in medicine and healthcare, enabling more accurate predictive models. However, this often comes at the cost of interpretability, limiting the clinical impact of machine learning methods. To realize the potential of machine learning in healthcare, it is critical to understand such models from the perspective of multiple stakeholders and various angles, necessitating different types of explanation. In this Perspective, we explore five fundamentally different types of post-hoc machine learning interpretability. We highlight the different types of information that they provide, and describe when each can be useful. We examine the various stakeholders in healthcare, delving into their specific objectives, requirements and goals. We discuss how current notions of interpretability can help meet these and what is required for each stakeholder to make machine learning models clinically impactful. Finally, to facilitate adoption, we release an open-source interpretability library containing implementations of the different types of interpretability, including tools for visualizing and exploring the explanations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

Code availability

The open-source software package is available via Github at https://github.com/vanderschaarlab/Interpretability.

References

  1. Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56 (2019).

    Article  Google Scholar 

  2. Volovici, V., Syn, N. L., Ercole, A., Zhao, J. J. & Liu, N. Steps to avoid overuse and misuse of machine learning in clinical research. Nat. Med. 28, 1996–1999 (2022).

    Article  Google Scholar 

  3. Caruana, R. et al. Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1721–1730 (ACM, 2015).

  4. Winkler, J. K. et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 155, 1135–1141 (2019).

    Article  Google Scholar 

  5. Amann, J. et al. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med. Inf. Decis. Making 20, 310 (2020).

    Article  Google Scholar 

  6. Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).

    Article  Google Scholar 

  7. Yoon, C. H., Torrance, R. & Scheinerman, N. Machine learning in medicine: should the pursuit of enhanced interpretability be abandoned? J. Med. Ethics 48, 581–585 (2022).

    Article  Google Scholar 

  8. Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) (Food and Drug Administration, 2019).

  9. Mourby, M., Ó Cathaoir, K. & Collin, C. B. Transparency of machine-learning in healthcare: the GDPR & European health law. Comput. Law Secur. Rev. 43, 105611 (2021).

    Article  Google Scholar 

  10. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  Google Scholar 

  11. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

    Article  Google Scholar 

  12. Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).

    Article  Google Scholar 

  13. Brown, T. B. et al. Language models are few-shot learners. Adv. Neur. Inf. Process. Syst. 33, 1877–1901 (2020).

    Google Scholar 

  14. Jiang, L. Y. et al. Health system-scale language models are all-purpose prediction engines. Nature (2023).

  15. Soenksen, L. R. et al. Integrated multimodal artificial intelligence framework for healthcare applications. npj Digit. Med. 5, 149 (2022).

    Article  Google Scholar 

  16. Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digit. Med. 1, 18 (2018).

    Article  Google Scholar 

  17. Alaa, A. M., Bolton, T., Di Angelantonio, E., Rudd, J. H. F. & van der Schaar, M. Cardiovascular disease risk prediction using automated machine learning: a prospective study of 423,604 UK Biobank participants. PLoS ONE 14, e0213653 (2019).

    Article  Google Scholar 

  18. Lee, C., Light, A., Saveliev, E. S., van der Schaar, M. & Gnanapragasam, V. J. Developing machine learning algorithms for dynamic estimation of progression during active surveillance for prostate cancer. npj Digit. Med. 5, 110 (2022).

    Article  Google Scholar 

  19. Akbilgic, O. & Davis, R. L. The promise of machine learning: when will it be delivered? J. Card. Fail. 25, 484–485 (2019).

    Article  Google Scholar 

  20. Schulz, M.-A. et al. Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets. Nat. Commun. 11, 4238 (2020).

    Article  Google Scholar 

  21. London, A. J. Artificial intelligence and black-box medical decisions: accuracy versus explainability. Hastings Cent. Rep. 49, 15–21 (2019).

    Article  Google Scholar 

  22. Biran, O. & Cotton, C. Explanation and justification in machine learning: a survey. IJCAI-17 Workshop on Explainable AI (XAI) 8, 8–13 (2017).

    Google Scholar 

  23. Miller, T. Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019).

    Article  MathSciNet  MATH  Google Scholar 

  24. Lipton, Z. C. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16, 31–57 (2018).

    Article  Google Scholar 

  25. Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’ Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (ACM, 2016).

  26. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neur. Inf. Process. Syst. 30, 4765–4774 (2017).

    Google Scholar 

  27. Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th International Conference on Machine Learning 3319–3328 (PMLR, 2017).

  28. Friedman, J. H. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).

    Article  MathSciNet  MATH  Google Scholar 

  29. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Article  MATH  Google Scholar 

  30. Fisher, A., Rudin, C. & Dominici, F. All models are wrong, but many are useful: learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res. 20, 1–81 (2019).

    MathSciNet  MATH  Google Scholar 

  31. Imrie, F., Norcliffe, A. L. I., Lio, P. & van der Schaar, M. Composite feature selection using deep ensembles. Adv. Neur. Inf. Process. Syst. 35, 36142–36160 (2022).

    Google Scholar 

  32. Aamodt, A. & Plaza, E. Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun. 7, 39–59 (1994).

    Article  Google Scholar 

  33. Crabbe, J., Qian, Z., Imrie, F. & van der Schaar, M. Explaining latent representations with a corpus of examples. Adv. Neur. Inf. Process. Syst. 34, 12154–12166 (2021).

    Google Scholar 

  34. Jeyakumar, J. V., Noor, J., Cheng, Y.-H., Garcia, L. & Srivastava, M. How can I explain this to you? An empirical study of deep neural network explanation methods. Adv. Neur. Inf. Process. Syst. 33, 4211–4222 (2020).

    Google Scholar 

  35. Wiesenfeld, B. M., Aphinyanaphongs, Y. & Nov, O. AI model transferability in healthcare: a sociotechnical perspective. Nat. Mach. Intell. 4, 807–809 (2022).

    Article  Google Scholar 

  36. Kim, B. et al. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In Proc. 35th International Conference on Machine Learning 2668–2677 (PMLR, 2018).

  37. Crabbé, J. & van der Schaar, M. Concept activation regions: A generalized framework for concept-based explanations. Adv. Neur. Inf. Process. Syst. 35, 2590–2607 (2022).

    Google Scholar 

  38. Ghorbani, A., Wexler, J., Zou, J. Y. & Kim, B. Towards automatic concept-based explanations. Adv. Neur. Inf. Process. Syst. 32, 9277–9286 (2019).

    Google Scholar 

  39. Thabtah, F. A review of associative classification mining. Knowl. Eng. Rev. 22, 37–65 (2007).

    Article  Google Scholar 

  40. Luo, G. Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction. Health Inf. Sci. Syst. 4, 2 (2016).

    Article  Google Scholar 

  41. Alaa, A. M. & van der Schaar, M. Prognostication and risk factors for cystic fibrosis via automated machine learning. Sci. Rep. 8, 11242 (2018).

    Article  Google Scholar 

  42. Alaa, A. M. & van der Schaar, M. Demystifying black-box models with symbolic metamodels. Adv. Neur. Inf. Process. Syst. 32, 11304–11314 (2019).

    Google Scholar 

  43. Crabbe, J., Zhang, Y., Zame, W. R. & van der Schaar, M. Learning outside the black-box: the pursuit of interpretable models. Adv. Neur. Inf. Process. Syst. 33, 17838–17849 (2020).

    Google Scholar 

  44. Min, F., Hu, Q. & Zhu, W. Feature selection with test cost constraint. Int. J. Approx. Reason. 55, 167–179 (2014).

    Article  MathSciNet  MATH  Google Scholar 

  45. Geirhos, R. et al. Shortcut learning in deep neural networks. Nat. Mach. Intell. 2, 665–673 (2020).

    Article  Google Scholar 

  46. DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).

    Article  Google Scholar 

  47. Roberts, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 3, 199–217 (2021).

    Article  Google Scholar 

  48. Ko, J. et al. Machine learning to detect signatures of disease in liquid biopsies—a user’s guide. Lab Chip 18, 395–405 (2018).

    Article  Google Scholar 

  49. Wang, D. et al. Identification of differentially expressed genes between original breast cancer and xenograft using machine learning algorithms. Genes 9, 155 (2018).

    Article  Google Scholar 

  50. Wu, E. et al. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat. Med. 27, 582–584 (2021).

    Article  Google Scholar 

  51. Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation) (EUR, 2016).

  52. Rajkomar, A., Hardt, M., Howell, M. D., Corrado, G. & Chin, M. H. Ensuring fairness in machine learning to advance health equity. Ann. Int. Med. 169, 866–872 (2018).

    Article  Google Scholar 

  53. Tomašev, N. et al. AI for social good: unlocking the opportunity for positive impact. Nat. Commun. 11, 2468 (2020).

    Article  Google Scholar 

  54. Kattan, M. W. et al. American Joint Committee on Cancer acceptance criteria for inclusion of risk models for individualized prognosis in the practice of precision medicine. CA Cancer J. Clin. 66, 370–374 (2016).

    Article  Google Scholar 

  55. Alaa, A. M., Gurdasani, D., Harris, A. L., Rashbass, J. & van der Schaar, M. Machine learning to guide the use of adjuvant therapies for breast cancer. Nat. Mach. Intell. 3, 716–726 (2021).

    Article  Google Scholar 

  56. Van der Velden, B. H., Kuijf, H. J., Gilhuijs, K. G. & Viergever, M. A. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med. Image Anal. 79, 102470 (2022).

    Article  Google Scholar 

  57. Rajpurkar, P. et al. CheXaid: deep learning assistance for physician diagnosis of tuberculosis using chest x-rays in patients with HIV. npj Digit. Med. 3, 115 (2020).

    Article  Google Scholar 

  58. Rudin, C. Why black box machine learning should be avoided for high-stakes decisions, in brief. Nat. Rev. Methods Primers 2, 81 (2022).

    Article  Google Scholar 

  59. Rudin, C., Wang, C. & Coker, B. The age of secrecy and unfairness in recidivism prediction. Harvard Data Sci. Rev. 2, https://hdsr.mitpress.mit.edu/pub/7z10o269 (2020).

  60. Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021).

    Article  Google Scholar 

  61. Reyes, M. et al. On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiol. Artif. Intell. 2, e190043 (2020).

    Article  Google Scholar 

  62. Reddy, S. Explainability and artificial intelligence in medicine. Lancet Digit. Health 4, e214–e215 (2022).

    Article  Google Scholar 

  63. Arcadu, F. et al. Deep learning algorithm predicts diabetic retinopathy progression in individual patients. npj Digit. Med. 2, 92 (2019).

    Article  Google Scholar 

  64. Pierson, E., Cutler, D. M., Leskovec, J., Mullainathan, S. & Obermeyer, Z. An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat. Med. 27, 136–140 (2021).

    Article  Google Scholar 

  65. van der Schaar, M. & Maxfield, N. Making machine learning interpretable: a dialog with clinicians. Van der Schaar Lab https://www.vanderschaar-lab.com/making-machine-learning-interpretable-a-dialog-with-clinicians/ (2021).

  66. Dandl, S., Molnar, C., Binder, M. & Bischl, B. Multi-objective counterfactual explanations. In International Conference on Parallel Problem Solving from Nature 448–469 (Springer, 2020).

Download references

Acknowledgements

F.I. and M.v.d.S. are supported by the National Science Foundation (NSF) under grant number 1722516. M.vdS. is also supported by the Office of Naval Research (ONR).

Author information

Authors and Affiliations

Authors

Contributions

F.I. and M.v.d.S. conceptualized the manuscript. R.D. developed the interpretability package. F.I. wrote the original draft of the manuscript, and all authors contributed to editing and revising it.

Corresponding author

Correspondence to Fergus Imrie.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Victor Volovici, Po-Hsuan Cameron Chen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Imrie, F., Davis, R. & van der Schaar, M. Multiple stakeholders drive diverse interpretability requirements for machine learning in healthcare. Nat Mach Intell 5, 824–829 (2023). https://doi.org/10.1038/s42256-023-00698-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00698-2

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing