Quantifying uncertainty in machine learning classifiers for medical imaging

Valen, John; Balki, Indranil; Mendez, Mauro; Qu, Wendi; Levman, Jacob; Bilbily, Alexander; Tyrrell, Pascal N.

doi:10.1007/s11548-022-02578-3

Quantifying uncertainty in machine learning classifiers for medical imaging

Original Article
Published: 12 March 2022

Volume 17, pages 711–718, (2022)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

John Valen¹,
Indranil Balki¹,
Mauro Mendez¹,
Wendi Qu¹,
Jacob Levman²,
Alexander Bilbily¹ &
…
Pascal N. Tyrrell ORCID: orcid.org/0000-0003-2277-3824^1,3,4

480 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Purpose

Machine learning (ML) models in medical imaging (MI) can be of great value in computer aided diagnostic systems, but little attention is given to the confidence (alternatively, uncertainty) of such models, which may have significant clinical implications. This paper applied, validated, and explored a technique for assessing uncertainty in convolutional neural networks (CNNs) in the context of MI.

Materials and methods

We used two publicly accessible imaging datasets: a chest x-ray dataset (pneumonia vs. control) and a skin cancer imaging dataset (malignant vs. benign) to explore the proposed measure of uncertainty based on experiments with different class imbalance-sample sizes, and experiments with images close to the classification boundary. We also further verified our hypothesis by examining the relationship with other performance metrics and cross-checking CNN predictions and confidence scores with an expert radiologist (available in the Supplementary Information). Additionally, bounds were derived on the uncertainty metric, and recommendations for interpretability were made.

Results

With respect to training set class imbalance for the pneumonia MI dataset, the uncertainty metric was minimized when both classes were nearly equal in size (regardless of training set size) and was approximately 17% smaller than the maximum uncertainty resulting from greater imbalance. We found that less-obvious test images (those closer to the classification boundary) produced higher classification uncertainty, about 10–15 times greater than images further from the boundary. Relevant MI performance metrics like accuracy, sensitivity, and sensibility showed seemingly negative linear correlations, though none were statistically significant (p \(\ge \) 0.05). The expert radiologist and CNN expressed agreement on a small sample of test images, though this finding is only preliminary.

Conclusions

This paper demonstrated the importance of uncertainty reporting alongside predictions in medical imaging. Results demonstrate considerable potential from automatically assessing classifier reliability on each prediction with the proposed uncertainty metric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Objective evaluation of deep uncertainty predictions for COVID-19 detection

Article Open access 17 January 2022

Hamzeh Asgharnezhad, Afshar Shamsi, … Sheikh Mohammed Shariful Islam

Are deep learning classification results obtained on CT scans fair and interpretable?

Article Open access 04 April 2024

Mohamad M. A. Ashames, Ahmet Demir, … Cuneyt Calisir

Generalizable disease detection using model ensemble on chest X-ray images

Article Open access 11 March 2024

Maider Abad, Jordi Casas-Roma & Ferran Prados

References

Kendall A, Gal Y (2017) What uncertainties do we need in Bayesian deep learning for computer vision? Adv Neural Inf Process Syst 2017:5575–5585
Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning, 33rd Int Conf Mach Learn ICML 3:1651–1660
Leibig C, Allken V, Ayhan MS, Berens P, Wahl S (2017) Leveraging uncertainty information from deep neural networks for disease detection. Sci Rep 7(1):1–14. https://doi.org/10.1038/s41598-017-17876-z
Article CAS Google Scholar
Balki I, Amirabadi A, Levman J, Martel AL, Emersic Z, Meden B, Garcia-Pedrero A, Ramirez SC, Kong D, Moody AR, Tyrrell PN (2019) Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc Radiol J 70(4):344–353. https://doi.org/10.1016/j.carj.2019.06.002
Article PubMed Google Scholar
Park SH, Han K (2018) Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiol 286(3):800–809. https://doi.org/10.1148/radiol.2017171920
Article Google Scholar
Goodfellow I, Bengio Y, Courville A (2015) Deep learning. MIT Press, USA
Google Scholar
Michelmore R, Kwiatkowska M and Gal Y (2018) Evaluating uncertainty quantification in end-to-end autonomous driving control,” [Online] Available: http://arxiv.org/abs/1811.06817
Tang A, Tam R, Cadrin-Chenevert A, Guest W, Chong J, Barfett J, Chepelev L, Cairns R, Mitchell JR, Cicero MD, Poudrette MG, Jaremko JL, Reinhold C, Gallix B, Gray B, Geis R (2018) Canadian association of radiologists white paper on artificial intelligence in radiology. Can Assoc Radiol Artif Intell Work Gr 69:120–135
Article Google Scholar
Kwon Y, Won J-H, Kim BJ, Paik MC (2020) “Uncertainty quantification using Bayesian neural networks in classification: application to ischemic stroke lesion segmentation. Comput Stat Data Anal 142:106816
Article Google Scholar
Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, McKeown A, Yang G, Wu X, Yan F, Dong J, Prasadha MK, Pei J, Ting MYL, Zhu J, Li C, Hewett S, Dong J, Ziyar I, Shi A, Zhang R, Zheng L, Hou R, Shi W, Fu X, Duan Y, Huu VAN, Wen C, Zhang ED, Zhang CL, Li O, Wang X, Singer MA, Sun X, Xu J, Tafreshi A, Lewis MA, Xia H, Zhang K (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5):1122-1131.e9. https://doi.org/10.1016/j.cell.2018.02.010
Article CAS PubMed Google Scholar

Download references

Funding

No funds, grants, or other support was received.

Author information

Authors and Affiliations

Department of Medical Imaging, University of Toronto, Toronto, ON, M5T 1W7, Canada
John Valen, Indranil Balki, Mauro Mendez, Wendi Qu, Alexander Bilbily & Pascal N. Tyrrell
Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada
Jacob Levman
Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
Pascal N. Tyrrell
Institute of Medical Science, University of Toronto, Toronto, ON, Canada
Pascal N. Tyrrell

Authors

John Valen
View author publications
You can also search for this author in PubMed Google Scholar
Indranil Balki
View author publications
You can also search for this author in PubMed Google Scholar
Mauro Mendez
View author publications
You can also search for this author in PubMed Google Scholar
Wendi Qu
View author publications
You can also search for this author in PubMed Google Scholar
Jacob Levman
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Bilbily
View author publications
You can also search for this author in PubMed Google Scholar
Pascal N. Tyrrell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pascal N. Tyrrell.

Ethics declarations

Conflict of interest

Dr. Bilbily is an officer and possesses shares in 16 Bit Inc, a medical AI startup company founded in 2016. Dr. Levman is founder of Time Will Tell Technologies, an AI focused technology startup company founded in 2021.

Ethical standards

This article does not contain any studies with human participants or animals performed by any of the authors.

Human and animals rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

This article does not contain patient data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

11548_2022_2578_MOESM1_ESM.png

Fig. S1 Ten test-set images (consisting of the 5 most-confidently classified by the CNN and the 5 least-confidently classified by the CNN, in orange) with confidence scores (between 0 and 1) alongside radiologist confidence scores (in blue) for a machine-human comparison (PNG 15 kb)

Supplementary file2 (DOCX 24 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Valen, J., Balki, I., Mendez, M. et al. Quantifying uncertainty in machine learning classifiers for medical imaging. Int J CARS 17, 711–718 (2022). https://doi.org/10.1007/s11548-022-02578-3

Download citation

Received: 26 April 2021
Accepted: 07 February 2022
Published: 12 March 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11548-022-02578-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantifying uncertainty in machine learning classifiers for medical imaging