Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation

Lu, Yixing; Fan, Zhaoxin; Xu, Min

doi:10.1007/978-3-031-53308-2_11

Yixing Lu¹⁴,
Zhaoxin Fan¹⁵ &
Min Xu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14555))

Included in the following conference series:

International Conference on Multimedia Modeling

958 Accesses

Abstract

In this paper, we introduce a novel semi-supervised learning framework tailored for medical image segmentation. Central to our approach is the innovative Multi-scale Text-aware ViT-CNN Fusion scheme. This scheme adeptly combines the strengths of both ViTs and CNNs, capitalizing on the unique advantages of both architectures as well as the complementary information in vision-language modalities. Further enriching our framework, we propose the Multi-Axis Consistency framework for generating robust pseudo labels, thereby enhancing the semi-supervised learning process. Our extensive experiments on several widely-used datasets unequivocally demonstrate the efficacy of our approach.

Z. Fan—Equal Contribution.

We thank Bowen Wei for helpful discussions on this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Textmatch: Using Text Prompts to Improve Semi-supervised Medical Image Segmentation

Multi-Consistency Training for Semi-Supervised Medical Image Segmentation

Article 10 May 2024

Decoupled Consistency for Semi-supervised Medical Image Segmentation

References

Ali, A., et al.: Xcit: cross-covariance image transformers. In: Advances in NeurIPS (2021)
Google Scholar
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. arXiv preprint (2019)
Google Scholar
Baker, N., et al.: Local features and global shape information in object classification by deep convolutional neural networks. Vision. Res. 172, 46–61 (2020)
Article Google Scholar
Cai, S., et al.: Dense-unet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network. Quant. Imaging Med. Surg. 10(6), 1275 (2020)
Article Google Scholar
Cao, H., et al.: Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv:2105.05537 (2021)
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of IEEE/CVF ICCV (2021)
Google Scholar
Chen, J., et al.: Transunet: transformers for medical image segmentation. arXiv:2102.04306 (2021)
Chen, X., et al.: Semi-supervised segmentation with cross pseudo supervision. In: Proceedings of IEEE/CVF CVPR (2021)
Google Scholar
Degerli, A., et al.: OSEGnet: operational segmentation network for COVID-19 detection using chest x-ray images. In: Proceedings of ICIP, pp. 2306–2310. IEEE (2022)
Google Scholar
Dosovitskiy, A., et al.: Transformers for image recognition at scale. arXiv:2010.11929 (2020)
Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 61–71. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_6
Chapter Google Scholar
Guo, C., et al.: SA-unet: Spatial attention u-net for retinal vessel segmentation. In: Proceedings of ICPR, pp. 1236–1242. IEEE (2021)
Google Scholar
Hang, W., et al.: Local and global structure-aware entropy regularized mean teacher model for 3D left atrium segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 562–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_55
Chapter Google Scholar
Hatamizadeh, A., et al.: Unetr: transformers for 3d medical image segmentation. In: Proceedings of IEEE/CVF WACV (2022)
Google Scholar
Huang, H., et al.: Unet 3+: a full-scale connected unet for medical image segmentation. In: Proceedings of ICASSP, pp. 1055–1059. IEEE (2020)
Google Scholar
Isensee, F., et al.: nnu-net: a self-configuring method for segmentation. Nat. Methods (2021)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint (2014)
Google Scholar
Kumar, N., et al.: A multi-organ nucleus segmentation challenge. IEEE Trans. Med. Imaging 39(5), 1380–1391 (2020). https://doi.org/10.1109/TMI.2019.2947628
Article Google Scholar
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv:1610.02242 (2016)
Li, B., et al.: Language-driven semantic segmentation. arXiv preprint arXiv:2201.03546 (2022)
Li, Y., et al.: GT u-net: a u-net like group transformer network for tooth root segmentation. In: Proceedings of MLMI (2021)
Google Scholar
Li, Z., et al.: LVIT: language meets vision transformer in medical image segmentation. IEEE Trans. Med. Imaging (2023)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer. In: Proceedings of IEEE/CVF ICCV (2021)
Google Scholar
Lüddecke, T., et al.: Image segmentation using text and image prompts. In: Proceedings of IEEE/CVF CVPR (2022)
Google Scholar
Luo, X., et al.: Semi-supervised medical image segmentation via cross teaching. arXiv:2112.04894 (2021)
Luo, X., et al.: Semi-supervised medical image segmentation via uncertainty rectified pyramid consistency. Med. Image Anal. (2022)
Google Scholar
Oktay, O., et al.: Attention u-net: learning where to look for the pancreas. arXiv preprint (2018)
Google Scholar
Pelka, O., Koitka, S., Rückert, J., Nensa, F., Friedrich, C.M.: Radiology objects in COntext (ROCO): a multimodal image dataset. In: Stoyanov, D., et al. (eds.) LABELS/CVII/STENT -2018. LNCS, vol. 11043, pp. 180–189. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01364-6_20
Chapter Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML (2021)
Google Scholar
Rao, Y., et al.: Denseclip: language-guided dense prediction with context-aware prompting. In: Proceedings of IEEE/CVF CVPR (2022)
Google Scholar
Ronneberger, O., et al.: U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical image computing and computer-assisted intervention (2015)
Google Scholar
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4
Chapter Google Scholar
Wang, G., et al.: Semi-supervised segmentation with multi-scale guided dense attention. IEEE Trans. Med. Imaging (2021)
Google Scholar
Wang, H., et al.: Uctransnet: rethinking the skip connections in u-net with transformer. In: Proceedings of AAAI (2022)
Google Scholar
Wang, K., et al.: Tripled-uncertainty guided mean teacher model for segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention (2021)
Google Scholar
Wang, Z., et al.: Cris: clip-driven referring image segmentation. In: Proceedings of IEEE/CVF CVPR (2022)
Google Scholar
Wu, Y., et al.: Mutual consistency learning for semi-supervised segmentation. Med. Image Anal. (2022)
Google Scholar
Xie, Y., Zhang, J., Shen, C., Xia, Y.: CoTr: efficiently bridging CNN and transformer for 3D medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 171–180. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_16
Chapter Google Scholar
Xu, M., et al.: A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model. arXiv preprint (2021)
Google Scholar
You, C., et al.: SimCVD: contrastive voxel-wise representation distillation for semi-supervised medical image segmentation. IEEE Trans. Med. Imaging (2022)
Google Scholar
Zhang, Y., et al.: A multi-branch hybrid transformer network for corneal endothelial cell segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 99–108. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_10
Chapter Google Scholar
Zhang, Y., Liu, H., Hu, Q.: TransFuse: fusing transformers and CNNs for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 14–24. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_2
Chapter Google Scholar
Zhou, Y., et al.: Semi-supervised multi-organ segmentation via deep multi-planar co-training. arXiv preprint (2018)
Google Scholar
Zhou, Z., et al.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Liverpool, Liverpool, UK
Yixing Lu
Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Zhaoxin Fan & Min Xu

Authors

Yixing Lu
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoxin Fan
View author publications
You can also search for this author in PubMed Google Scholar
Min Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min Xu .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Stevan Rudinac
Delft University of Technology, Delft, The Netherlands
Alan Hanjalic
Delft University of Technology, Delft, The Netherlands
Cynthia Liem
University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Reykjavik University, Reykjavik, Iceland
Björn Þór Jónsson
Microsoft Research Lab – Asia, Beijing, China
Bei Liu
The University of Tokyo, Tokyo, Japan
Yoko Yamakata

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, Y., Fan, Z., Xu, M. (2024). Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14555. Springer, Cham. https://doi.org/10.1007/978-3-031-53308-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-53308-2_11
Published: 28 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53307-5
Online ISBN: 978-3-031-53308-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation