Abstract
Although deep learning-based AI systems for diagnostic imaging tasks have virtually showed superhuman accuracy, their use in medical settings has been questioned due to their “black box”, not interpretable nature. To address this shortcoming, several methods have been proposed to make AI eXplainable (XAI), including Pixel Attribution Methods; however, it is still unclear whether these methods are actually effective in “opening” the black-box and improving diagnosis, particularly in tasks where pathological conditions are difficult to detect. In this study, we focus on the detection of thoraco-lumbar fractures from X-rays with the goal of assessing the impact of PAMs on diagnostic decision making by addressing two separate research questions: first, whether activation maps (as an instance of PAM) were perceived as useful in the aforementioned task; and, second, whether maps were also capable to reduce the diagnostic error rate. We show that, even though AMs were not considered significantly useful by physicians, the image readers found high value in the maps in relation to other perceptual dimensions (i.e., pertinency, coherence) and, most importantly, their accuracy significantly improved when given XAI support in a pilot study involving 7 doctors in the interpretation of a small, but carefully chosen, set of images.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
It is worth of note that activation maps are different from saliency maps, although the two terms are often used interchangeably. In fact, the two approaches rely on different methods to compute heatmaps: saliency maps are usually generated by means of back-propagation w.r.t. to the input of the network [35], while activation maps are obtained by means of the feature maps obtained at a specific layer of the network [42].
- 2.
More precisely, both specialists had to convene that the images were at least of level 3 (“sufficient image quality: moderate limitations for clinical use but no substantial loss of information”) on the absolute Visual Grading Analysis (VGA) scale [28], which is a 5-value ordinal scale from 1 (“excellent image quality: no limitations for clinical use”) to 5 (“poor image quality: image not usable, loss of information, image must be repeated”).
- 3.
Balance in model accuracy was also guaranteed at class level: in each group (fractured vs non-fractured), 2 images were associated with a misdiagnosis by the model while 4 were correctly classified.
- 4.
- 5.
- 6.
Clarity would be a fourth relevant dimension of heatmaps for radiological use or XAI settings, as it relates to the accurate presentation of anatomical or pathological structures. However, this dimension was not assessed by the sample of readers involved, as images had already been selected to be of optimal clarity. Moreover, correlation between clarity and utility has been conjectured to be obvious and not worthy of investigation.
- 7.
It is noteworthy that the white-box paradox can also mislead doctors when the advice is wrong, in that it can convince them of the opposite, as it has been reported in [7].
- 8.
It should be noted that “the implicit assumption [...] that the specific (diagnostic) message of the X-ray images resided inside them from the beginning, and that it is obscured either by technological or epistemological problems [is contestable as too naive]”. Conversely it has been argued [33] that “the specific content of the images was shaped by the activities of X-ray workers within the context of medical developments of the time” when x-ray imaging was introduced in medical practice at the beginning of the 20th century. In these days, we could be witnessing the same phenomenon, in which radiologists, specialists, data scientists and ML developers could participatorily co-develop a machine semiotics for specific diagnostic tasks, if they are willing.
References
Aggarwal, R., et al.: Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit. Med. 4(1), 1–23 (2021)
Alqaraawi, A., Schuessler, M., Weiß, P., Costanza, E., Berthouze, N.: Evaluating saliency map explanations for convolutional neural networks: a user study. In: Proceedings of the 25th International Conference on Intelligent User Interfaces, pp. 275–285 (2020)
Arun, N., et al.: Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiol. Artif. Intell. 3(6), e200267 (2021)
Ayhan, M.S., et al.: Clinical validation of saliency maps for understanding deep neural networks in ophthalmology. Med. Image Anal. 77, 102364 (2022)
Badgeley, M.A., et al.: Deep learning predicts hip fracture using confounding patient and healthcare variables. NPJ Digit. Med. 2(1), 1–10 (2019)
Balki, I., et al.: Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can. Assoc. Radiol. J. 70(4), 344–353 (2019)
Bansal, G., et al.: Does the whole exceed its parts? The effect of AI explanations on complementary team performance. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–16 (2021)
Becherer, N., Pecarina, J., Nykl, S., Hopkinson, K.: Improving optimization of convolutional neural networks through parameter fine-tuning. Neural Comput. Appl. 31(8), 3469–3479 (2017). https://doi.org/10.1007/s00521-017-3285-0
Brynjolfsson, E., Mitchell, T.: What can machine learning do? Workforce implications. Science 358(6370), 1530–1534 (2017)
Cabitza, F., Campagner, A., Del Zotti, F., Ravizza, A., Sternini, F.: All you need is higher accuracy? On the quest for minimum acceptable accuracy for medical artificial intelligence. In: e-Health 2020, Proceedings of the 12th International Conference on e-Health, pp. 159–166 (2020)
Cabitza, F.: Biases affecting human decision making in AI-supported second opinion settings. In: Torra, V., Narukawa, Y., Pasi, G., Viviani, M. (eds.) MDAI 2019. LNCS (LNAI), vol. 11676, pp. 283–294. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26773-5_25
Cabitza, F., Campagner, A., Cavosi, V.: Assessing the impact of medical AI: a survey of physicians’ perceptions. In: 2021 5th International Conference on Medical and Health Informatics, pp. 225–231 (2021)
Cabitza, F., Campagner, A., Simone, C.: The need to move away from agential-AI: empirical investigations, useful concepts and open issues. Int. J. Hum Comput Stud. 155, 102696 (2021)
Chinn, S.: A simple method for converting an odds ratio to effect size for use in meta-analysis. Stat. Med. 19(22), 3127–3131 (2000)
Chlap, P., Min, H., Vandenberg, N., Dowling, J., Holloway, L., Haworth, A.: A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 65(5), 545–563 (2021)
Croskerry, P., Cosby, K., Graber, M.L., Singh, H.: Diagnosis: Interpreting the Shadows. CRC Press, Boca Raton (2017)
Delmas, P.D., et al.: Underdiagnosis of vertebral fractures is a worldwide problem: the IMPACT study. J. Bone Miner. Res. 20(4), 557–563 (2005)
Ghassemi, M., Oakden-Rayner, L., Beam, A.L.: The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3(11), e745–e750 (2021)
Han, T., et al.: Advancing diagnostic performance and clinical usability of neural networks via adversarial training and dual batch normalization. Nat. Commun. 12(1), 1–11 (2021)
Handelman, G.S., et al.: Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. Am. J. Roentgenol. 212(1), 38–43 (2019)
Holzinger, A., Saranti, A., Molnar, C., Biecek, P., Samek, W.: Explainable AI methods - a brief overview. In: Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, K.R., Samek, W. (eds.) xxAI 2020. LNCS, vol. 13200, pp. 13–38. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-04083-2_2
Holzinger, A.T., Muller, H.: Toward human-AI interfaces to support explainability and causability in medical AI. Computer 54(10), 78–86 (2021)
Hwang, E.J., et al.: Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw. Open 2(3), e191095 (2019)
Jha, S., Topol, E.J.: Adapting to artificial intelligence: radiologists and pathologists as information specialists. JAMA 316(22), 2353–2354 (2016)
Ke, A., Ellsworth, W., Banerjee, O., Ng, A.Y., Rajpurkar, P.: CheXtransfer: performance and parameter efficiency of ImageNet models for chest X-Ray interpretation. CoRR abs/2101.06871 (2021)
Liu, X., et al.: A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1(6), e271–e297 (2019)
Lohoff, L., Rühr, A.: Introducing (machine) learning ability as antecedent of trust in intelligent systems. In: ECIS 2021 Research Papers, vol. 23 (2021)
Ludewig, E., Richter, A., Frame, M.: Diagnostic imaging-evaluating image quality using visual grading characteristic (VGC) analysis. Vet. Res. Commun. 34(5), 473–479 (2010). https://doi.org/10.1007/s11259-010-9413-2
Lyell, D., Coiera, E.: Automation bias and verification complexity: a systematic review. J. Am. Med. Inform. Assoc. 24(2), 423–431 (2017)
Nandi, A., Pal, A.K.: Detailing image interpretability methods. In: Nandi, A., Pal, A.K. (eds.) Interpreting Machine Learning Models, pp. 271–293. Springer, Cham (2022). https://doi.org/10.1007/978-1-4842-7802-4_12
Neves, I., et al.: Interpretable heartbeat classification using local model-agnostic explanations on ECGs. Comput. Biol. Med. 133, 104393 (2021)
Olczak, J., et al.: Artificial intelligence for analyzing orthopedic trauma radiographs. Acta Orthop. 88(6), 581–586 (2017)
Pasveer, B.: Knowledge of shadows: the introduction of X-ray images in medicine. Sociol. Health Illn. 11(4), 360–381 (1989)
Ribeiro, M.T., Singh, S., Guestrin, C.: Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: Workshop at International Conference on Learning Representations (2014)
Spinks, G., Moens, M.F.: Justifying diagnosis decisions by deep neural networks. J. Biomed. Inform. 96, 103248 (2019)
Taylor, R.: Interpretation of the correlation coefficient: a basic review. J. Diagn. Med. Sonogr. 6(1), 35–39 (1990)
Tiulpin, A., Thevenot, J., Rahtu, E., Lehenkari, P., Saarakkala, S.: Automatic knee osteoarthritis diagnosis from plain radiographs: a deep learning-based approach. Sci. Rep. 8(1), 1–10 (2018)
Tschandl, P., et al.: Human-computer collaboration for skin cancer recognition. Nat. Med. 26(8), 1229–1234 (2020)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Yang, S., Yin, B., Cao, W., Feng, C., Fan, G., He, S.: Diagnostic accuracy of deep learning in orthopaedic fractures: a systematic review and meta-analysis. Clin. Radiol. 75(9), 713-e17 (2020)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 IFIP International Federation for Information Processing
About this paper
Cite this paper
Cabitza, F., Campagner, A., Famiglini, L., Gallazzi, E., La Maida, G.A. (2022). Color Shadows (Part I): Exploratory Usability Evaluation of Activation Maps in Radiological Machine Learning. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-14463-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14462-2
Online ISBN: 978-3-031-14463-9
eBook Packages: Computer ScienceComputer Science (R0)