Skip to main content
Log in

Quantifying uncertainty in machine learning classifiers for medical imaging

  • Original Article
  • Published:
International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Abstract

Purpose

Machine learning (ML) models in medical imaging (MI) can be of great value in computer aided diagnostic systems, but little attention is given to the confidence (alternatively, uncertainty) of such models, which may have significant clinical implications. This paper applied, validated, and explored a technique for assessing uncertainty in convolutional neural networks (CNNs) in the context of MI.

Materials and methods

We used two publicly accessible imaging datasets: a chest x-ray dataset (pneumonia vs. control) and a skin cancer imaging dataset (malignant vs. benign) to explore the proposed measure of uncertainty based on experiments with different class imbalance-sample sizes, and experiments with images close to the classification boundary. We also further verified our hypothesis by examining the relationship with other performance metrics and cross-checking CNN predictions and confidence scores with an expert radiologist (available in the Supplementary Information). Additionally, bounds were derived on the uncertainty metric, and recommendations for interpretability were made.

Results

With respect to training set class imbalance for the pneumonia MI dataset, the uncertainty metric was minimized when both classes were nearly equal in size (regardless of training set size) and was approximately 17% smaller than the maximum uncertainty resulting from greater imbalance. We found that less-obvious test images (those closer to the classification boundary) produced higher classification uncertainty, about 10–15 times greater than images further from the boundary. Relevant MI performance metrics like accuracy, sensitivity, and sensibility showed seemingly negative linear correlations, though none were statistically significant (p \(\ge \) 0.05). The expert radiologist and CNN expressed agreement on a small sample of test images, though this finding is only preliminary.

Conclusions

This paper demonstrated the importance of uncertainty reporting alongside predictions in medical imaging. Results demonstrate considerable potential from automatically assessing classifier reliability on each prediction with the proposed uncertainty metric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Kendall A, Gal Y (2017) What uncertainties do we need in Bayesian deep learning for computer vision? Adv Neural Inf Process Syst 2017:5575–5585

  2. Gal Y, Ghahramani Z (2016) Dropout as a Bayesian approximation: representing model uncertainty in deep learning, 33rd Int Conf Mach Learn ICML 3:1651–1660

  3. Leibig C, Allken V, Ayhan MS, Berens P, Wahl S (2017) Leveraging uncertainty information from deep neural networks for disease detection. Sci Rep 7(1):1–14. https://doi.org/10.1038/s41598-017-17876-z

    Article  CAS  Google Scholar 

  4. Balki I, Amirabadi A, Levman J, Martel AL, Emersic Z, Meden B, Garcia-Pedrero A, Ramirez SC, Kong D, Moody AR, Tyrrell PN (2019) Sample-size determination methodologies for machine learning in medical imaging research: a systematic review. Can Assoc Radiol J 70(4):344–353. https://doi.org/10.1016/j.carj.2019.06.002

    Article  PubMed  Google Scholar 

  5. Park SH, Han K (2018) Methodologic guide for evaluating clinical performance and effect of artificial intelligence technology for medical diagnosis and prediction. Radiol 286(3):800–809. https://doi.org/10.1148/radiol.2017171920

    Article  Google Scholar 

  6. Goodfellow I, Bengio Y, Courville A (2015) Deep learning. MIT Press, USA

    Google Scholar 

  7. Michelmore R, Kwiatkowska M and Gal Y (2018) Evaluating uncertainty quantification in end-to-end autonomous driving control,” [Online] Available: http://arxiv.org/abs/1811.06817

  8. Tang A, Tam R, Cadrin-Chenevert A, Guest W, Chong J, Barfett J, Chepelev L, Cairns R, Mitchell JR, Cicero MD, Poudrette MG, Jaremko JL, Reinhold C, Gallix B, Gray B, Geis R (2018) Canadian association of radiologists white paper on artificial intelligence in radiology. Can Assoc Radiol Artif Intell Work Gr 69:120–135

    Article  Google Scholar 

  9. Kwon Y, Won J-H, Kim BJ, Paik MC (2020) “Uncertainty quantification using Bayesian neural networks in classification: application to ischemic stroke lesion segmentation. Comput Stat Data Anal 142:106816

    Article  Google Scholar 

  10. Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, McKeown A, Yang G, Wu X, Yan F, Dong J, Prasadha MK, Pei J, Ting MYL, Zhu J, Li C, Hewett S, Dong J, Ziyar I, Shi A, Zhang R, Zheng L, Hou R, Shi W, Fu X, Duan Y, Huu VAN, Wen C, Zhang ED, Zhang CL, Li O, Wang X, Singer MA, Sun X, Xu J, Tafreshi A, Lewis MA, Xia H, Zhang K (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5):1122-1131.e9. https://doi.org/10.1016/j.cell.2018.02.010

    Article  CAS  PubMed  Google Scholar 

Download references

Funding

No funds, grants, or other support was received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pascal N. Tyrrell.

Ethics declarations

Conflict of interest

Dr. Bilbily is an officer and possesses shares in 16 Bit Inc, a medical AI startup company founded in 2016. Dr. Levman is founder of Time Will Tell Technologies, an AI focused technology startup company founded in 2021.

Ethical standards

This article does not contain any studies with human participants or animals performed by any of the authors.

Human and animals rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

This article does not contain patient data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

11548_2022_2578_MOESM1_ESM.png

Fig. S1 Ten test-set images (consisting of the 5 most-confidently classified by the CNN and the 5 least-confidently classified by the CNN, in orange) with confidence scores (between 0 and 1) alongside radiologist confidence scores (in blue) for a machine-human comparison (PNG 15 kb)

Supplementary file2 (DOCX 24 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Valen, J., Balki, I., Mendez, M. et al. Quantifying uncertainty in machine learning classifiers for medical imaging. Int J CARS 17, 711–718 (2022). https://doi.org/10.1007/s11548-022-02578-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11548-022-02578-3

Keywords

Navigation