Abstract
Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net’s decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net’s unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
If segmentation masks were not available, patch-related prototypes could efficiently be collected manually, since the sparsity of PIP-Net results in a reasonable number of relevant prototypes (only 119 for ISIC).
References
Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE Access 6, 52138–52160 (2018)
Akata, Z., et al.: A research agenda for hybrid intelligence: Augmenting human intellect with collaborative, adaptive, responsible, and explainable artificial intelligence. Computer 53(8), 18–28 (2020). https://doi.org/10.1109/MC.2020.2996587
Anders, C.J., Weber, L., Neumann, D., Samek, W., Müller, K.R., Lapuschkin, S.: Finding and removing clever hans: Using explanation methods to debug and improve deep models. Inform. Fusion 77, 261–295 (2022). https://doi.org/10.1016/j.inffus.2021.07.015, https://www.sciencedirect.com/science/article/pii/S1566253521001573
Badgeley, M.A., et al.: Deep learning predicts hip fracture using confounding patient and healthcare variables. npj Digital Med. 2(1), 1–10 (Apr 2019). https://doi.org/10.1038/s41746-019-0105-1, https://www.nature.com/articles/s41746-019-0105-1
Barnett, A.J., et al.: A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nature Mach. Intell. 3(12), 1061–1070 (2021)
Biederman, I.: Recognition-by-components: a theory of human image understanding. Psychol. Rev. 94(2), 115 (1987)
Borys, K., Schmitt, Y.A., Nauta, M., Seifert, C., Krämer, N., Friedrich, C.M., Nensa, F.: Explainable ai in medical imaging: An overview for clinical practitioners - saliency-based xai approaches. Europ. J. Radiol. 162, 110787 (2023). https://doi.org/10.1016/j.ejrad.2023.110787, https://www.sciencedirect.com/science/article/pii/S0720048X23001018
Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., Su, J.: This looks like that: Deep learning for interpretable image recognition. In: NeurIPS (2019). https://proceedings.neurips.cc/paper/2019/hash/adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html
Codella, N., et al.: Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the international skin imaging collaboration (ISIC). arXiv:1902.03368 [cs] (2019). http://arxiv.org/abs/1902.03368
Colin, J., Fel, T., Cadene, R., Serre, T.: What i cannot predict, i do not understand: a human-centered evaluation framework for explainability methods. In: Advances in Neural Information Processing Systems (Oct 2022)
DeGrave, A.J., Janizek, J.D., Lee, S.I.: Ai for radiographic covid-19 detection selects shortcuts over signal. Nature Machi. Intell. 3(7), 610–619 (2021)
Geirhos, R., et al.: Shortcut learning in deep neural networks. Nature Mach. Intell. 2(11), 665–673 (2020)
Han, S.M., et al.: Radiographic analysis of adult ankle fractures using combined danis-weber and lauge-hansen classification systems. Sci. Rep. 10(1), 7655 (2020)
Jin, W., Li, X., Hamarneh, G.: Evaluating explainable AI on a multi-modal medical imaging task: can existing algorithms fulfill clinical requirements? Proc. AAAI Conf. Artif. Intell.36(11), 11945–11953 (Jun 2022). https://doi.org/10.1609/aaai.v36i11.21452, https://ojs.aaai.org/index.php/AAAI/article/view/21452
Kahn, C.E., Carrino, J.A., Flynn, M.J., Peck, D.J., Horii, S.C.: Dicom and radiology: past, present, and future. J. Am. Coll. Radiol. 4(9), 652–657 (2007)
Kirichenko, P., Izmailov, P., Wilson, A.G.: Last layer re-training is sufficient for robustness to spurious correlations. arXiv preprint arXiv:2204.02937 (2022)
Langerhuizen, D.W.G., et al.: What are the applications and limitations of artificial intelligence for fracture detection and classification in orthopaedic trauma imaging? a systematic review. Clin. Orthopaedics Related Res. ®477(11), 2482 (Nov 2019). https://doi.org/10.1097/CORR.0000000000000848
Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., Müller, K.R.: Unmasking clever hans predictors and assessing what machines really learn. Nature Commun. 10(1), 1096 (2019). https://doi.org/10.1038/s41467-019-08987-4, https://www.nature.com/articles/s41467-019-08987-4, number: 1 Publisher: Nature Publishing Group
Lau, B.C., Allahabadi, S., Palanca, A., Oji, D.E.: Understanding radiographic measurements used in foot and ankle surgery. J. Am. Acad. Orthop. Surg. 30(2), e139–e154 (2022). https://doi.org/10.5435/JAAOS-D-20-00189
Liu, X., et al.: A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health 1(6), e271–e297 (Oct 2019). https://doi.org/10.1016/S2589-7500(19)30123-2, http://www.sciencedirect.com/science/article/pii/S2589750019301232
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A convnet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11976–11986 (June 2022)
Mawatari, T., et al.: The effect of deep convolutional neural networks on radiologists’ performance in the detection of hip fractures on digital pelvic radiographs. European J. Radiol. 130, 109188 (2020). https://doi.org/10.1016/j.ejrad.2020.109188, https://www.sciencedirect.com/science/article/pii/S0720048X20303776
Mishra, N.K., Celebi, M.E.: An overview of melanoma detection in dermoscopy images using image processing and machine learning. arXiv preprint arXiv:1601.07843 (2016)
Mohammadjafari, S., Cevik, M., Thanabalasingam, M., Basar, A.: Using protopnet for interpretable alzheimer’s disease classification. In: Canadian Conference on AI (2021)
Müller, S.G., Hutter, F.: Trivialaugment: Tuning-free yet state-of-the-art data augmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 774–782 (October 2021)
Nauta, M., van Bree, R., Seifert, C.: Neural prototype trees for interpretable fine-grained image recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14928–14938 (2021). https://doi.org/10.1109/CVPR46437.2021.01469
Nauta, M., Schlötterer, J., van Keulen, M., Seifert, C.: Pip-net: patch-based intuitive prototypes for interpretable image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2023)
Nauta, M., Walsh, R., Dubowski, A., Seifert, C.: Uncovering and correcting shortcut learning in machine learning models for skin cancer diagnosis. Diagnostics 12(1) (2022). DOI: 10.3390/diagnostics12010040, https://www.mdpi.com/2075-4418/12/1/40
Pahde, F., Dreyer, M., Samek, W., Lapuschkin, S.: Reveal to revise: An explainable AI life cycle for iterative bias correction of deep models (2023)
Rajpurkar, P., et al.: Mura: large dataset for abnormality detection in musculoskeletal radiographs. arXiv preprint arXiv:1712.06957 (2017)
Rieger, L., Singh, C., Murdoch, W.J., Yu, B.: Interpretations are useful: Penalizing explanations to align neural networks with prior knowledge. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of Machine Learning Research, vol. 119, pp. 8116–8126. PMLR (2020). http://proceedings.mlr.press/v119/rieger20a.html
Ross, A.S., Hughes, M.C., Doshi-Velez, F.: Right for the right reasons: training differentiable models by constraining their explanations. In: IJCAI (2017). https://doi.org/10.24963/ijcai.2017/371, https://doi.org/10.24963/ijcai.2017/371
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Mach. Intell. 1(5), 206–215 (2019)
Rymarczyk, D., Pardyl, A., Kraus, J., Kaczyńska, A., Skomorowski, M., Zieliński, B.: Protomil: multiple instance learning with prototypical parts for whole-slide image classification. In: Amini, M.R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds.) Machine Learning and Knowledge Discovery in Databases, pp. 421–436. Springer International Publishing, Cham (2023)
Rymarczyk, D., Struski, Ł., Górszczak, M., Lewandowska, K., Tabor, J., Zieliński, B.: Interpretable image classification with differentiable prototypes assignment. In: Computer Vision - ECCV 2022. pp. 351–368. Springer Nature Switzerland, Cham (2022)
Salahuddin, Z., Woodruff, H.C., Chatterjee, A., Lambin, P.: Transparency of deep neural networks for medical image analysis: a review of interpretability methods. Comput. Biol. Med. 140, 105111 (2022). https://doi.org/10.1016/j.compbiomed.2021.105111, https://www.sciencedirect.com/science/article/pii/S0010482521009057
Shen, H., Huang, T.H.: How useful are the machine-generated interpretations to general users? a human evaluation on guessing the incorrectly predicted labels. Proc. AAAI Conf. Human Comput. Crowdsourc. 8(1), 168–172 (Oct 2020). https://doi.org/10.1609/hcomp.v8i1.7477, https://ojs.aaai.org/index.php/HCOMP/article/view/7477
Singh, G., Yow, K.C.: An interpretable deep learning model for Covid-19 detection with chest x-ray images. IEEE Access 9, 85198–85208 (2021). https://doi.org/10.1109/ACCESS.2021.3087583
Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J.: Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans. Med. Imaging 35(5), 1299–1312 (2016). https://doi.org/10.1109/TMI.2016.2535302
Teso, S., Kersting, K.: Explanatory interactive machine learning. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 239–245. AIES ’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3306618.3314293, https://doi.org/10.1145/3306618.3314293
Yufit, P., Seligson, D.: Malleolar ankle fractures. a guide to evaluation and treatment. Orthopaedics Trauma 24(4), 286–297 (2010). https://doi.org/10.1016/j.mporth.2010.03.010, https://www.sciencedirect.com/science/article/pii/S1877132710000357
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Nauta, M., Hegeman, J.H., Geerdink, J., Schlötterer, J., Keulen, M.v., Seifert, C. (2024). Interpreting and Correcting Medical Image Classification with PIP-Net. In: Nowaczyk, S., et al. Artificial Intelligence. ECAI 2023 International Workshops. ECAI 2023. Communications in Computer and Information Science, vol 1947. Springer, Cham. https://doi.org/10.1007/978-3-031-50396-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-50396-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50395-5
Online ISBN: 978-3-031-50396-2
eBook Packages: Computer ScienceComputer Science (R0)