Abstract
Artificial intelligence (AI) has achieved notable performances in many fields and its research impact in healthcare has been unquestionable. Nevertheless, the deployment of such computational models in clinical practice is still limited. Some of the major issues recognized as barriers to a successful real-world machine learning applications include lack of: transparency; reliability and personalization. Actually, these aspects are decisive not only for patient safety, but also to assure the confidence of professionals. Explainable AI aims at to achieve solutions for artificial intelligence transparency and reliability concerns, with the capacity to better understand and trust a model, providing the ability to justify its outcomes, thus effectively assisting clinicians in rationalizing the model prediction. This work proposes an innovative machine learning based approach, implementing a hybrid scheme, able to combine in a systematic way knowledge-driven and data-driven techniques. In a first step a global set of interpretable rules is generated, founded on clinical evidence. Then, in a second phase, a machine learning model is trained to select, from the global set of rules, the subset that is more appropriate for a given patient, according to his particular characteristics. This approach addresses simultaneously three of the central requirements of explainable AI—interpretability, personalization, and reliability—without impairing the accuracy of the model’s prediction. The scheme was validated with a real dataset provided by two Portuguese Hospitals, the Santa Cruz Hospital, Lisbon, and the Santo André Hospital, Leiria, comprising a total of N = 1111 patients that suffered an acute coronary syndrome event, where the 30 days mortality was assessed. When compared with standard black-box structures (e.g. feedforward neural network) the proposed scheme achieves similar performances, while ensures simultaneously clinical interpretability and personalization of the model, as well as provides a level of reliability to the estimated mortality risk.
Similar content being viewed by others
References
Adadi A, Berrada M (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6:52138–52160
Adebayo J, Gilmer J, Muelly M, Goodfellow I, Hardt M, Kim B (2018) Sanity checks for saliency maps. Adv Neural Inf Process Syst 568:9505–9515
Ahmad M, Eckert C, Teredesai A, McKelvey G (2018) Interpretable machine learning in healthcare. IEEE Intell Inf Bull 19(1):596
Arrieta B (2019) Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion. https://doi.org/10.1016/j.inffus.2019.12.012,2019
Barocas S, Hardt M, Narayanan A (2001) Fairness and machine learning limitations and opportunities. In: Smatinc scholar proceedings, ID: 113402716
Bella A, Ferri C, Orallo J, Quintana M (2019) Calibration of Machine learning models. In: Handbook of research on machine learning applications. IGI Global
Bradley A (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30:1145–1159
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth Inc
Burkart N, Huber M (2021) A Survey on the explainability of supervised machine learning. J Artif Intell Res 70:245–317
Burrell J (2016) How the machine thinks: Understanding opacity in machine learning algorithms. Big Data Soc 3(1):568
Carrington A, Manuel D, Fieguth P, Ramsay T, Osmani V, Wernly B, Bennett C, Hawken S, McInnes M, Magwood O, Sheikh Y, Holzinger A (2021) Deep ROC analysis and AUC as balanced average accuracy to improve model selection, understanding and interpretation. In: IEEE transactions on pattern analysis and machine intelligence. arXiv:2103.11357
Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings 21th SIGKDD International conference on knowledge discovery and data mining, pp 1721–1730
Carvalho D, Pereira E, Cardoso J (2019) Machine learning interpretability: a survey on methods and metrics. Electronics 8:832. https://doi.org/10.3390/electronics8080832
Casalicchio G, Molnar C, Bischl B (2018) Visualizing the feature importance for black box models. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 655–670 (2018)
Cohen W (1995) Fast effective rule induction. Semantic Scholar. https://doi.org/10.1016/b978-1-55860-377-6.50023
Dandl S, Molnar C, Binder M, Bischl B (2020) Multi-objective counterfactual explanations. arXiv preprint arXiv:2004.11165
Doshi-Velez F, Towards K (2017) A rigorous science of interpretable machine learning. arXiv:1702.08608
Dubey R, Zhou J, Wang Y, Thompson PM, Ye J (2014) Analysis of sampling techniques for imbalanced data: An n= 648 adni study. Neuroimage 87:220–241
European Commission (2020) White Paper On Artificial Intelligence—a European approach to excellence and trust. https://ec.europa.eu/info/sites/info/files/commission-white-paper-artificial-intelligence-feb2020en.pdf
Freitas A (2014) Comprehensible classification models: a position paper. ACM SIGKDD Explor Newsl 15(1):1–10
Frosst N, Hinton G (2017) Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784
Gavish B, Ben-Dov IZ, Bursztyn M (2008) Linear relationship between systolic and diastolic blood pressure monitored over 24 h: assessment and correlates. J Hypertens 26:199
Gonçalves P, Ferreira J, Aguiar C, Seabra-Gomes R (2005) TIMI, PURSUIT, and GRACE risk scores: sustained prognostic value and interaction with revascularization in NSTE-ACS. Eur Heart J 26:865
Granger C (2003) Predictors of hospital mortality in the global registry of acute coronary events. Arch Intern Med 163:2345
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv
Hajian S, Bonchi F, Castillo C (2016) Algorithmic bias: From discrimination discovery to fairness-aware data mining. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 2125–2126
Hajian-Tilaki K (2013) Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med Spring 4(2):627–635
Hastie T, Friedman J, Tibshirani R (2001) The elements of statistical learning, volume 1. Springer series in statistics Springer, Berlin
Holmes G, Hall M, Prank E (1999) Generating rule sets from model trees. In: Australasian joint conference on artificial intelligence. Springer, pp 1–12
Holzinger A, Langs G, Denk H, Zatloukal K, Muller H (2019) Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip Rev Data Min Knowl Discov 9(4):e1312. https://doi.org/10.1002/widm.1312
Holzinger A, Carrington A, Muller H (2020) Measuring the quality of explanations: the system causability scale (SCS). KI Künstliche Intell 34:193–198. https://doi.org/10.1007/s13218-020-00636-z
Holzinger A, Carrington A, Muller H (2021) Measuring the quality of explanations: the system causability scale (SCS). Inf Fusion 71:28–37
Holzinger A, Malle B, Saranti A, Pfeifer B (2021) Towards multi-modal causability with Graph Neural Networks enabling information fusion for explainable AI. Inf Fusion 71:28–37
Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24(2):225–232. https://doi.org/10.1007/s00180-008-0119-7
Krishnan S, Wu E (2017) Palm: machine learning explanations for iterative debugging. In: Proceedings of the 2nd workshop on human-in-the-loop data analytics, pp. 1–6
Linardatos P, Papastefanopoulos V, Kotsiantis S (2021) Explainable AI: a review of machine learning interpretability methods. Entropy 23:18. https://doi.org/10.3390/e23010018
Lipton Z (2018) The mythos of model interpretability. Queue 16(3):31–57
Luque A, Carrasco A, Martína A, Heras A (2019) The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit 91:216–231
Margot V (2020) A rigorous method to compare interpretability. hal-02530389v5
Markus A, Kors J, Rijnbeek P (2020) The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J Biomed Inform. https://doi.org/10.1016/j.jbi.2020.103655
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2019) A survey on bias and fairness in machine learning. arXiv:1908.09635 [cs]
Molnar C, Casalicchio G, Bischl B (2019) Quantifying model complexity via functional decomposition for better post-hoc interpretability. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 193–204
Molnar C. Interpretable machine learning. A guide for making black box models explainable. https://christophm.github.io/interpretable-ml-book/. ISBN 978-0-244-76852-2
Mothilal R, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 conference on fairness, accountability, and transparency, pp. 607–617
Murdoch W, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning. PNAS J 11(44):22071–22080
O’Sullivan S (2019) Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. Int J Med Robot Comput Assist Surg 15:1–12. https://doi.org/10.1002/rcs
Parliament and Council of the European Union. General data protection regulation. https://eur-lex.europa.eu/eli/reg/2016/679/oj
Philipp M, Rusch T, Hornik K, Strobl C (2018) Measuring the stability of results from supervised statistical learning. J Comput Graph Stat 27(4):685–700
Puri N, Gupta P, Agarwal P, Verma S, Krishnamurthy B (2017) Magix: model agnostic globally interpretable explanations. arXiv preprint arXiv:1706.07160
Qayyum A, Qadir J, Bilal M, Fuqaha A (2020) Secure and robust machine learning for healthcare: a survey. IEEE Rev Biomed Eng
Quinlan J (1986) Induction of decision trees. Mach Learn 1:81–106
Quinlan J (1992) C4.5 programs for machine learning. Morgan Kaufmann
Nassih R, Berrado A (2020) State of the art of Fairness, Interpretability and Explainability. In: 13th international conference on intelligent systems: theories and applications (SITA’20). September 23–24, 2020, Rabat, Morocco. ACM, New York, NY, USA
Ribeiro M, Singh S, Guestrin C (2016) Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1135–1144
Ribeiro M, Singh S, Guestrin C (2018) Anchors: high-precision model-agnostic explanations. In: Thirty-second AAAI conference on artificial intelligence
Rudin C, Chen C, Chen Z, Huang H, Semenova L, Zhong C (2022) Interpretable machine learning: fundamental principles and 10 grand challenges. Stat Surv. arXiv:2103.1125
Schneeberger D, Stoger K, Holzinger A (2020) The European Legal Framework for Medical AI”; Springer Nature Switzerland AG 2020. In: Holzinger A et al (eds). https://doi.org/10.1007/978-3-030-57321-8_12
Selvaraju R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2019) Grad-CAM: visual explanations from deep networks. Int J Comput vis. https://doi.org/10.1007/s11263-019-01228-7
Stanford University (2016) Artificial intelligence: trends and predictions for 2030. https://www.qulix.com/wp-content/uploads/2017/04/Artificial-intelligence-Trends-and-Predictions-for-2030.pdf
Strobl C, Boulesteix A, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinform 9(1):307. https://doi.org/10.1186/1471-2105-9-307
Strumbelj E, Kononenko I (2014) Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst 41(3):647–665
Tjoa E, Guan C (2015) A survey on explainable artificial intelligence: towards medical XAI. J Latex Class Files 14(8):564
Ustun B, Rudin C (2016) Supersparse linear integer models for optimized medical scoring systems. Mach Learn 102(3):349–391
Wang T, Rudin C, Velez F, Liu Y, Klampfl E, MacNeille P (2017) A Bayesian framework for learning rule sets for interpretable classification. J Mach Learn Res 18:256
Ying X (2019) An overview of overfitting and its solutions. IOP Conf Ser J Phys Conf Ser 1168-022022
Zhou J, Gandomi A, Chen F, Holzinger A (2021) Evaluating the quality of machine learning explanations: a survey on methods and metrics. Electronics 10:593. https://doi.org/10.3390/electronics10050593
Zhou Q, Liao F, Mou C, Wang P (2018) Measuring interpretability for different types of machine learning models. In: Paciffc-Asia conference on knowledge discovery and data mining, pp 295–308
Acknowledgements
This work was supported by the lookAfterRisk research project (POCI-01-0145-FEDER-030290). The authors would also like to thank to the Santo André Hospital/Leiria Hospital Centre and the Santa Cruz Hospital, Lisbon, for providing the clinical datasets used in this study.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Martin Atzmueller, Johannes Fürnkranz, Tomáš Kliegr and Ute Schmid.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Valente, F., Paredes, S., Henriques, J. et al. Interpretability, personalization and reliability of a machine learning based clinical decision support system. Data Min Knowl Disc 36, 1140–1173 (2022). https://doi.org/10.1007/s10618-022-00821-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-022-00821-8