Color Shadows (Part I): Exploratory Usability Evaluation of Activation Maps in Radiological Machine Learning

Cabitza, Federico; Campagner, Andrea; Famiglini, Lorenzo; Gallazzi, Enrico; La Maida, Giovanni Andrea

doi:10.1007/978-3-031-14463-9_3

Federico Cabitza^11,12,
Andrea Campagner¹¹,
Lorenzo Famiglini¹¹,
Enrico Gallazzi¹³ &
…
Giovanni Andrea La Maida¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13480))

Included in the following conference series:

International Cross-Domain Conference for Machine Learning and Knowledge Extraction

1431 Accesses
2 Altmetric

Abstract

Although deep learning-based AI systems for diagnostic imaging tasks have virtually showed superhuman accuracy, their use in medical settings has been questioned due to their “black box”, not interpretable nature. To address this shortcoming, several methods have been proposed to make AI eXplainable (XAI), including Pixel Attribution Methods; however, it is still unclear whether these methods are actually effective in “opening” the black-box and improving diagnosis, particularly in tasks where pathological conditions are difficult to detect. In this study, we focus on the detection of thoraco-lumbar fractures from X-rays with the goal of assessing the impact of PAMs on diagnostic decision making by addressing two separate research questions: first, whether activation maps (as an instance of PAM) were perceived as useful in the aforementioned task; and, second, whether maps were also capable to reduce the diagnostic error rate. We show that, even though AMs were not considered significantly useful by physicians, the image readers found high value in the maps in relation to other perceptual dimensions (i.e., pertinency, coherence) and, most importantly, their accuracy significantly improved when given XAI support in a pilot study involving 7 doctors in the interpretation of a small, but carefully chosen, set of images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing explainability in medical image classification and analyzing osteonecrosis X-ray images using shadow learner system

Article 12 December 2024

Balancing Performance and Interpretability in Medical Image Analysis: Case study of Osteopenia

Article 17 July 2024

Demystification of AI-driven medical image interpretation: past, present and future

Article 13 August 2018

Notes

1.
It is worth of note that activation maps are different from saliency maps, although the two terms are often used interchangeably. In fact, the two approaches rely on different methods to compute heatmaps: saliency maps are usually generated by means of back-propagation w.r.t. to the input of the network [35], while activation maps are obtained by means of the feature maps obtained at a specific layer of the network [42].
2.
More precisely, both specialists had to convene that the images were at least of level 3 (“sufficient image quality: moderate limitations for clinical use but no substantial loss of information”) on the absolute Visual Grading Analysis (VGA) scale [28], which is a 5-value ordinal scale from 1 (“excellent image quality: no limitations for clinical use”) to 5 (“poor image quality: image not usable, loss of information, image must be repeated”).
3.
Balance in model accuracy was also guaranteed at class level: in each group (fractured vs non-fractured), 2 images were associated with a misdiagnosis by the model while 4 were correctly classified.
4.
https://www.limesurvey.org/.
5.
https://juxtapose.knightlab.com/.
6.
Clarity would be a fourth relevant dimension of heatmaps for radiological use or XAI settings, as it relates to the accurate presentation of anatomical or pathological structures. However, this dimension was not assessed by the sample of readers involved, as images had already been selected to be of optimal clarity. Moreover, correlation between clarity and utility has been conjectured to be obvious and not worthy of investigation.
7.
It is noteworthy that the white-box paradox can also mislead doctors when the advice is wrong, in that it can convince them of the opposite, as it has been reported in [7].
8.
It should be noted that “the implicit assumption [...] that the specific (diagnostic) message of the X-ray images resided inside them from the beginning, and that it is obscured either by technological or epistemological problems [is contestable as too naive]”. Conversely it has been argued [33] that “the specific content of the images was shaped by the activities of X-ray workers within the context of medical developments of the time” when x-ray imaging was introduced in medical practice at the beginning of the 20th century. In these days, we could be witnessing the same phenomenon, in which radiologists, specialists, data scientists and ML developers could participatorily co-develop a machine semiotics for specific diagnostic tasks, if they are willing.

References

Aggarwal, R., et al.: Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit. Med. 4(1), 1–23 (2021)
Article Google Scholar
Alqaraawi, A., Schuessler, M., Weiß, P., Costanza, E., Berthouze, N.: Evaluating saliency map explanations for convolutional neural networks: a user study. In: Proceedings of the 25th International Conference on Intelligent User Interfaces, pp. 275–285 (2020)
Google Scholar
Arun, N., et al.: Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiol. Artif. Intell. 3(6), e200267 (2021)
Article Google Scholar
Ayhan, M.S., et al.: Clinical validation of saliency maps for understanding deep neural networks in ophthalmology. Med. Image Anal. 77, 102364 (2022)
Article Google Scholar
Badgeley, M.A., et al.: Deep learning predicts hip fracture using confounding patient and healthcare variables. NPJ Digit. Med. 2(1), 1–10 (2019)
Article Google Scholar
Balki, I., et al.: Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can. Assoc. Radiol. J. 70(4), 344–353 (2019)
Article Google Scholar
Bansal, G., et al.: Does the whole exceed its parts? The effect of AI explanations on complementary team performance. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–16 (2021)
Google Scholar
Becherer, N., Pecarina, J., Nykl, S., Hopkinson, K.: Improving optimization of convolutional neural networks through parameter fine-tuning. Neural Comput. Appl. 31(8), 3469–3479 (2017). https://doi.org/10.1007/s00521-017-3285-0
Article Google Scholar
Brynjolfsson, E., Mitchell, T.: What can machine learning do? Workforce implications. Science 358(6370), 1530–1534 (2017)
Article Google Scholar
Cabitza, F., Campagner, A., Del Zotti, F., Ravizza, A., Sternini, F.: All you need is higher accuracy? On the quest for minimum acceptable accuracy for medical artificial intelligence. In: e-Health 2020, Proceedings of the 12th International Conference on e-Health, pp. 159–166 (2020)
Google Scholar
Cabitza, F.: Biases affecting human decision making in AI-supported second opinion settings. In: Torra, V., Narukawa, Y., Pasi, G., Viviani, M. (eds.) MDAI 2019. LNCS (LNAI), vol. 11676, pp. 283–294. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26773-5_25
Chapter Google Scholar
Cabitza, F., Campagner, A., Cavosi, V.: Assessing the impact of medical AI: a survey of physicians’ perceptions. In: 2021 5th International Conference on Medical and Health Informatics, pp. 225–231 (2021)
Google Scholar
Cabitza, F., Campagner, A., Simone, C.: The need to move away from agential-AI: empirical investigations, useful concepts and open issues. Int. J. Hum Comput Stud. 155, 102696 (2021)
Article Google Scholar
Chinn, S.: A simple method for converting an odds ratio to effect size for use in meta-analysis. Stat. Med. 19(22), 3127–3131 (2000)
Article Google Scholar
Chlap, P., Min, H., Vandenberg, N., Dowling, J., Holloway, L., Haworth, A.: A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 65(5), 545–563 (2021)
Article Google Scholar
Croskerry, P., Cosby, K., Graber, M.L., Singh, H.: Diagnosis: Interpreting the Shadows. CRC Press, Boca Raton (2017)
Book Google Scholar
Delmas, P.D., et al.: Underdiagnosis of vertebral fractures is a worldwide problem: the IMPACT study. J. Bone Miner. Res. 20(4), 557–563 (2005)
Article Google Scholar
Ghassemi, M., Oakden-Rayner, L., Beam, A.L.: The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3(11), e745–e750 (2021)
Article Google Scholar
Han, T., et al.: Advancing diagnostic performance and clinical usability of neural networks via adversarial training and dual batch normalization. Nat. Commun. 12(1), 1–11 (2021)
Article Google Scholar
Handelman, G.S., et al.: Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. Am. J. Roentgenol. 212(1), 38–43 (2019)
Article Google Scholar
Holzinger, A., Saranti, A., Molnar, C., Biecek, P., Samek, W.: Explainable AI methods - a brief overview. In: Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, K.R., Samek, W. (eds.) xxAI 2020. LNCS, vol. 13200, pp. 13–38. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-04083-2_2
Chapter Google Scholar
Holzinger, A.T., Muller, H.: Toward human-AI interfaces to support explainability and causability in medical AI. Computer 54(10), 78–86 (2021)
Article Google Scholar
Hwang, E.J., et al.: Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw. Open 2(3), e191095 (2019)
Article Google Scholar
Jha, S., Topol, E.J.: Adapting to artificial intelligence: radiologists and pathologists as information specialists. JAMA 316(22), 2353–2354 (2016)
Article Google Scholar
Ke, A., Ellsworth, W., Banerjee, O., Ng, A.Y., Rajpurkar, P.: CheXtransfer: performance and parameter efficiency of ImageNet models for chest X-Ray interpretation. CoRR abs/2101.06871 (2021)
Google Scholar
Liu, X., et al.: A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit. Health 1(6), e271–e297 (2019)
Article Google Scholar
Lohoff, L., Rühr, A.: Introducing (machine) learning ability as antecedent of trust in intelligent systems. In: ECIS 2021 Research Papers, vol. 23 (2021)
Google Scholar
Ludewig, E., Richter, A., Frame, M.: Diagnostic imaging-evaluating image quality using visual grading characteristic (VGC) analysis. Vet. Res. Commun. 34(5), 473–479 (2010). https://doi.org/10.1007/s11259-010-9413-2
Article Google Scholar
Lyell, D., Coiera, E.: Automation bias and verification complexity: a systematic review. J. Am. Med. Inform. Assoc. 24(2), 423–431 (2017)
Article Google Scholar
Nandi, A., Pal, A.K.: Detailing image interpretability methods. In: Nandi, A., Pal, A.K. (eds.) Interpreting Machine Learning Models, pp. 271–293. Springer, Cham (2022). https://doi.org/10.1007/978-1-4842-7802-4_12
Chapter Google Scholar
Neves, I., et al.: Interpretable heartbeat classification using local model-agnostic explanations on ECGs. Comput. Biol. Med. 133, 104393 (2021)
Article Google Scholar
Olczak, J., et al.: Artificial intelligence for analyzing orthopedic trauma radiographs. Acta Orthop. 88(6), 581–586 (2017)
Article Google Scholar
Pasveer, B.: Knowledge of shadows: the introduction of X-ray images in medicine. Sociol. Health Illn. 11(4), 360–381 (1989)
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: Workshop at International Conference on Learning Representations (2014)
Google Scholar
Spinks, G., Moens, M.F.: Justifying diagnosis decisions by deep neural networks. J. Biomed. Inform. 96, 103248 (2019)
Article Google Scholar
Taylor, R.: Interpretation of the correlation coefficient: a basic review. J. Diagn. Med. Sonogr. 6(1), 35–39 (1990)
Article Google Scholar
Tiulpin, A., Thevenot, J., Rahtu, E., Lehenkari, P., Saarakkala, S.: Automatic knee osteoarthritis diagnosis from plain radiographs: a deep learning-based approach. Sci. Rep. 8(1), 1–10 (2018)
Article Google Scholar
Tschandl, P., et al.: Human-computer collaboration for skin cancer recognition. Nat. Med. 26(8), 1229–1234 (2020)
Article Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Yang, S., Yin, B., Cao, W., Feng, C., Fan, G., He, S.: Diagnostic accuracy of deep learning in orthopaedic fractures: a systematic review and meta-analysis. Clin. Radiol. 75(9), 713-e17 (2020)
Article Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar

Download references

Author information

Authors and Affiliations

Universitá degli Studi di Milano-Bicocca, Milan, Italy
Federico Cabitza, Andrea Campagner & Lorenzo Famiglini
IRCCS Istituto Galeazzi Milano, Milan, Italy
Federico Cabitza
Istituto Ortopedico Gaetano Pini — ASST Pini-CTO, Milan, Italy
Enrico Gallazzi & Giovanni Andrea La Maida

Authors

Federico Cabitza
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Campagner
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Famiglini
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Gallazzi
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Andrea La Maida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Federico Cabitza .

Editor information

Editors and Affiliations

University of Natural Resources and Life Sciences Vienna, Vienna, Austria
Andreas Holzinger
St. Pölten University of Applied Sciences, St. Pölten, Austria
Peter Kieseberg
TU Wien, Vienna, Austria
A Min Tjoa
SBA Research, Vienna, Austria
Edgar Weippl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cabitza, F., Campagner, A., Famiglini, L., Gallazzi, E., La Maida, G.A. (2022). Color Shadows (Part I): Exploratory Usability Evaluation of Activation Maps in Radiological Machine Learning. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds) Machine Learning and Knowledge Extraction. CD-MAKE 2022. Lecture Notes in Computer Science, vol 13480. Springer, Cham. https://doi.org/10.1007/978-3-031-14463-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-14463-9_3
Published: 11 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14462-2
Online ISBN: 978-3-031-14463-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Color Shadows (Part I): Exploratory Usability Evaluation of Activation Maps in Radiological Machine Learning