Abstract
Recent developments in deep learning (DL) techniques have led to great performance improvement in medical image segmentation tasks, especially with the latest Transformer model and its variants. While labels from fusing multi-rater manual segmentations are often employed as ideal ground truths in DL model training, inter-rater variability due to factors such as training bias, image noise, and extreme anatomical variability can still affect the performance and uncertainty of the resulting algorithms. Knowledge regarding how inter-rater variability affects the reliability of the resulting DL algorithms, a key element in clinical deployment, can help inform better training data construction and DL models, but has not been explored extensively. In this paper, we measure aleatoric and epistemic uncertainties using test-time augmentation (TTA), test-time dropout (TTD), and deep ensemble to explore their relationship with inter-rater variability. Furthermore, we compare UNet and TransUNet to study the impacts of Transformers on model uncertainty with two label fusion strategies. We conduct a case study using multi-class paraspinal muscle segmentation from T2w MRIs. Our study reveals the interplay between inter-rater variability and uncertainties, affected by choices of label fusion strategies and DL models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Camarasa, R., et al.: A quantitative comparison of epistemic uncertainty maps applied to multi-class segmentation. Mach. Learn. Biomed. Imaging 1, 1–39 (2021)
Chen, J., et al.: Transunet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Coupe, P., Yger, P., Prima, S., Hellier, P., Kervrann, C., Barillot, C.: An optimized blockwise nonlocal means denoising filter for 3-D magnetic resonance images. IEEE Trans. Med. Imaging 27(4), 425–441 (2008). https://doi.org/10.1109/TMI.2007.906087.PMID:18390341;PMCID:PMC2881565
Der Kiureghian, A., Ditlevsen, O.D.: Aleatoric or epistemic? Does it matter? Struct. Saf. 31(2), 105–112 (2009). https://doi.org/10.1016/j.strusafe.2008.06.020
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 1050–1059. PMLR (2016)
Ghandeharioun, A., Eoff, B., Jou, B., Picard, R.: Characterizing sources of uncertainty to proxy calibration and disambiguate annotator and data bias. In: IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 4202–4206 (2019)
Jensen, M.H., Jørgensen, D.R., Jalaboi, R., Hansen, M.E., Olsen, M.A.: Improving uncertainty estimation in convolutional neural networks using inter-rater agreement. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11767, pp. 540–548. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32251-9_59
Jones, C.K., Wang, G., Yedavalli, V., Sair, H.: Direct quantification of epistemic and aleatoric uncertainty in 3D U-net segmentation. J. Med. Imaging (Bellingham) 9(3), 034002 (2022). https://doi.org/10.1117/1.JMI.9.3.034002. Epub 2022 Jun 8. PMID: 35692283; PMCID: PMC9174341
Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? Adv. Neural Inf. Process. Syst. 30 (2017)
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. Adv. Neural Inf. Process. Syst. 30 (2017)
Laves, M.H., Ihler, S., Fast, J., Kahrs, L., Ortmaier, T.: Recalibration of aleatoric and epistemic regression uncertainty in medical imaging. Mach. Learn. Biomed. Imaging 1, 1–26 (2021)
Lemay, A., Gros, C., Naga Karthik, E., Cohen-Adad, J.: Label fusion and training methods for reliable representation of inter-rater uncertainty. Mach. Learn. Biomed. Imaging 1, 1–27 (2022)
Mobiny, A., Yuan, P., Moulik, S.K., Garg, N., Wu, C.C., Van Nguyen, H.: Dropconnect is effective in modeling uncertainty of bayesian deep networks. Sci. Rep. 11(1), 1–14 (2021)
Nichyporuk, B., et al.: Rethinking generalization: the impact of annotation style on medical image segmentation. Mach. Learn. Biomed. Imaging 1, 1–37 (2022)
Roshanzamir, P., et al.: Joint paraspinal muscle segmentation and inter-rater labeling variability prediction with multi-task TransUNet. In: International Workshop on Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, 14 September 2022, pp. 125–134. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16749-2_12
Tustison, N.J., et al.: N4ITK: improved N3 bias correction. IEEE Trans. Med. Imaging 29(6), 1310 (2010)
Vincent, O., Gros, C., Cohen-Adad, J.: Impact of individual rater style on deep learning uncertainty in medical imaging segmentation. arXiv preprint arXiv:2105.02197 (2021)
Wang, G., Li, W., Aertsen, M., Deprest, J., Ourselin, S., Vercauteren, T.: Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks. Neurocomputing 338, 34–45 (2019)
Wilson, A.G., Izmailov, P.: Bayesian deep learning and a probabilistic perspective of generalization. Adv. Neural. Inf. Process. Syst. 33, 4697–4708 (2020)
Xiao, Y., Fortin, M., Ahn, J., Rivaz, H., Peters, T.M., Battie, M.C.: Statistical morphological analysis reveals characteristic paraspinal muscle asymmetry in unilateral lumbar disc herniation. Sci. Rep. 11, 15576 (2021). https://doi.org/10.1038/s41598-021-95149-6
Acknowledgment
We acknowledge the support of the Natural Sciences and Engineering Research Council of Canada (NSERC) and NVIDIA for donation of the GPU.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Roshanzamir, P. et al. (2023). How Inter-rater Variability Relates to Aleatoric and Epistemic Uncertainty: A Case Study with Deep Learning-Based Paraspinal Muscle Segmentation. In: Sudre, C.H., Baumgartner, C.F., Dalca, A., Mehta, R., Qin, C., Wells, W.M. (eds) Uncertainty for Safe Utilization of Machine Learning in Medical Imaging. UNSURE 2023. Lecture Notes in Computer Science, vol 14291. Springer, Cham. https://doi.org/10.1007/978-3-031-44336-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-44336-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44335-0
Online ISBN: 978-3-031-44336-7
eBook Packages: Computer ScienceComputer Science (R0)