An Uncertainty-Aware Transformer for MRI Cardiac Semantic Segmentation via Mean Teachers

Wang, Ziyang; Zheng, Jian-Qing; Voiculescu, Irina

doi:10.1007/978-3-031-12053-4_37

Ziyang Wang¹¹,
Jian-Qing Zheng¹² &
Irina Voiculescu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13413))

Included in the following conference series:

Annual Conference on Medical Image Understanding and Analysis

3301 Accesses
16 Citations

Abstract

Deep learning methods have shown promising performance in medical image semantic segmentation. The cost of high-quality annotations, however, is still high and hard to access as clinicians are pressed for time. In this paper, we propose to utilize the power of Vision Transformer (ViT) with a semi-supervised framework for medical image semantic segmentation. The framework consists of a student model and a teacher model, where the student model learns from image feature information and helps teacher model to update parameters. The consistency of the inference of unlabeled data between the student model and teacher model is studied, so the whole framework is set to minimize segmentation supervision loss and consistency semi-supervision loss. To improve the semi-supervised performance, an uncertainty estimation scheme is introduced to enable the student model to learn from only reliable inference data during consistency loss calculation. The approach of filtering inconclusive images via an uncertainty value and the weighted sum of two losses in the training process is further studied. In addition, ViT is selected and properly developed as a backbone for the semi-supervised framework under the concern of long-range dependencies modeling. Our proposed method is tested with a variety of evaluation methods on a public benchmarking MRI dataset. The results of the proposed method demonstrate competitive performance against other state-of-the-art semi-supervised algorithms as well as several segmentation backbones.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Self-Loop Uncertainty: A Novel Pseudo-Label for Semi-supervised Medical Image Segmentation

Uncertainty Guided Semi-supervised Segmentation of Retinal Layers in OCT Images

Tripled-Uncertainty Guided Mean Teacher Model for Semi-supervised Medical Image Segmentation

References

Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)
Article Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Ibtehaz, N., Rahman, M.S.: MultiResUNet: rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 121, 74–87 (2020)
Article Google Scholar
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? Adv. Neural. Inf. Process. Syst. 30, 5574–5584 (2017)
Google Scholar
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)
Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)
Luo, X.: SSL4MIS (2020). https://github.com/HiLab-git/SSL4MIS
Oktay, O., et al.: Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: ENet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)
Qiao, S., Shen, W., Zhang, Z., Wang, B., Yuille, A.: Deep co-training for semi-supervised image recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 142–159. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_9
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Strudel, R.: Segmenter (2021). https://github.com/rstrudel/segmenter
Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. arXiv preprint arXiv:2105.05633 (2021)
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1195–1204 (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Vu, T.H., Jain, H., Bucher, M., Cord, M., Pérez, P.: ADVENT: adversarial entropy minimization for domain adaptation in semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2517–2526 (2019)
Google Scholar
Wang, Z., Voiculescu, I.: Quadruple augmented pyramid network for multi-class COVID-19 segmentation via CT. In: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC) (2021)
Google Scholar
Wang, Z., Zhang, Z., Voiculescu, I.: RAR-U-Net: a residual encoder to attention decoder by residual connections framework for spine segmentation under noisy labels. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 21–25. IEEE (2021)
Google Scholar
Wightman, R.: Pytorch image models (2019). https://github.com/rwightman/pytorch-image-models. https://doi.org/10.5281/zenodo.4414861
Yu, L., Wang, S., Li, X., Fu, C.-W., Heng, P.-A.: Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 605–613. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_67
Chapter Google Scholar
Zhang, Y., Yang, L., Chen, J., Fredericksen, M., Hughes, D.P., Chen, D.Z.: Deep adversarial networks for biomedical image segmentation utilizing unannotated images. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 408–416. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_47
Chapter Google Scholar
Zhang, Z., Li, S., Wang, Z., Lu, Y.: A novel and efficient tumor detection framework for pancreatic cancer via CT images. In: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pp. 1160–1164. IEEE (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Oxford, Oxford, UK
Ziyang Wang & Irina Voiculescu
The Kennedy Institute of Rheumatology, University of Oxford, Oxford, UK
Jian-Qing Zheng

Authors

Ziyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Qing Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Irina Voiculescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziyang Wang .

Editor information

Editors and Affiliations

Imperial College London, London, UK
Guang Yang
University of Cambridge, Cambridge, UK
Angelica Aviles-Rivero
University of Cambridge, Cambridge, UK
Michael Roberts
University of Cambridge, Cambridge, UK
Carola-Bibiane Schönlieb

A Appendix

Table 4 gives detailed systematic IOU results under different assumptions of the ratio of labeled to total data, on the MRI Cardiac test set. It is pleasantly remarkable to see serviceable results being obtained with a proportion of labelled data as small as 1%, 2%, or 3% of the total. Given the small set of type-specific annotations that exist, they can now be put to good use by pairing them with large amounts of unlabeled data and making them available through our proposed method.

Table 5 and Table 6 reports the different approaches to modify the threshold $\tau $ of filtering certain or uncertain pixels with uncertainty estimation scheme, and the weight $\lambda $ of loss $\mathcal {L}_\mathrm{c}$ in each training iteration. We explore the fixed value, exponential ramp up [7], linear ramp up, cosine ramp down [9] and variants of them. Details of exponential ramp up, linear ramp up and cosine ramp down is illustrated in the following Eq. 8, 9, 10, respectively. Each experiment is conducted with different approaches under the other one either $\tau $ or $\lambda $ is fixed with exponential ramp up. The results illustrates different approaches of updating $\tau $, $\lambda $ in each training iteration cannot significantly improve the performance of proposed method, and all other experiments for $\tau $, $\lambda $ is with exponential ramp up.

$$\begin{aligned} \tau or \lambda = e^{-5\times (1-t_\mathrm{iteration}/t_\mathrm{maxiteration})^{2}} \end{aligned}$$

(8)

$$\begin{aligned} \tau or \lambda = t_\mathrm{iteration}/t_\mathrm{maxiteration)} \end{aligned}$$

(9)

$$\begin{aligned} \tau or \lambda = 0.5 \times (cosine(\pi \times t_\mathrm{iteration} / t_\mathrm{maxiteration})+1) \end{aligned}$$

(10)

Table 5. Ablation studies on the threshold setting of uncertainty in training process (the higher, the better)

Full size table

Table 6. Ablation studies on the weight setting of consistency loss in training process (the higher, the better)

Full size table

Figure 4 sketches randomly selected raw images with their corresponding uncertainty maps, and masks generated by proposed method at three different stages (from the beginning to the end) of the training process. In uncertainty maps, yellow represents the teacher ViT $f_\mathrm{t}$ is uncertain of prediction with the given pixels, and blue represents the teacher ViT $f_\mathrm{t}$ is certain of prediction with the given pixels. The uncertainty map is gradually moving from yellow to green in the training process as shown in Fig. 4. The threshold of certainty estimation is then applied with uncertainty map which results in masks, where the white represents that the prediction by teacher ViT $f_\mathrm{t}$ is certain enough to guide the student ViT $f_\mathrm{s}$ i.e. for calculation the consistency loss $\mathcal {L}_\mathrm{s}$, and the black represents that the pixels with uncertainty is temporally unavailable to be considered in consistency semi-supervision loss calculation. Please remind that both the background and ROI can be certain with the white simultaneously. Some typical example masks illustrates that model is only uncertain with the boundary of ROI as shown in Fig. 4, and finally the framework is very likely to be certain with the whole image with a proper threshold setting, that the uncertainty map is going to be blue, mask is going to be white in the end of training process.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Zheng, JQ., Voiculescu, I. (2022). An Uncertainty-Aware Transformer for MRI Cardiac Semantic Segmentation via Mean Teachers. In: Yang, G., Aviles-Rivero, A., Roberts, M., Schönlieb, CB. (eds) Medical Image Understanding and Analysis. MIUA 2022. Lecture Notes in Computer Science, vol 13413. Springer, Cham. https://doi.org/10.1007/978-3-031-12053-4_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-12053-4_37
Published: 25 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-12052-7
Online ISBN: 978-3-031-12053-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Uncertainty-Aware Transformer for MRI Cardiac Semantic Segmentation via Mean Teachers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-Loop Uncertainty: A Novel Pseudo-Label for Semi-supervised Medical Image Segmentation

Uncertainty Guided Semi-supervised Segmentation of Retinal Layers in OCT Images

Tripled-Uncertainty Guided Mean Teacher Model for Semi-supervised Medical Image Segmentation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us