Skip to main content

Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation

  • Conference paper
  • First Online:
MultiMedia Modeling (MMM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14555))

Included in the following conference series:

  • 958 Accesses

Abstract

In this paper, we introduce a novel semi-supervised learning framework tailored for medical image segmentation. Central to our approach is the innovative Multi-scale Text-aware ViT-CNN Fusion scheme. This scheme adeptly combines the strengths of both ViTs and CNNs, capitalizing on the unique advantages of both architectures as well as the complementary information in vision-language modalities. Further enriching our framework, we propose the Multi-Axis Consistency framework for generating robust pseudo labels, thereby enhancing the semi-supervised learning process. Our extensive experiments on several widely-used datasets unequivocally demonstrate the efficacy of our approach.

Z. Fan—Equal Contribution.

We thank Bowen Wei for helpful discussions on this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ali, A., et al.: Xcit: cross-covariance image transformers. In: Advances in NeurIPS (2021)

    Google Scholar 

  2. Alsentzer, E., et al.: Publicly available clinical BERT embeddings. arXiv preprint (2019)

    Google Scholar 

  3. Baker, N., et al.: Local features and global shape information in object classification by deep convolutional neural networks. Vision. Res. 172, 46–61 (2020)

    Article  Google Scholar 

  4. Cai, S., et al.: Dense-unet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network. Quant. Imaging Med. Surg. 10(6), 1275 (2020)

    Article  Google Scholar 

  5. Cao, H., et al.: Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv:2105.05537 (2021)

  6. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of IEEE/CVF ICCV (2021)

    Google Scholar 

  7. Chen, J., et al.: Transunet: transformers for medical image segmentation. arXiv:2102.04306 (2021)

  8. Chen, X., et al.: Semi-supervised segmentation with cross pseudo supervision. In: Proceedings of IEEE/CVF CVPR (2021)

    Google Scholar 

  9. Degerli, A., et al.: OSEGnet: operational segmentation network for COVID-19 detection using chest x-ray images. In: Proceedings of ICIP, pp. 2306–2310. IEEE (2022)

    Google Scholar 

  10. Dosovitskiy, A., et al.: Transformers for image recognition at scale. arXiv:2010.11929 (2020)

  11. Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 61–71. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_6

    Chapter  Google Scholar 

  12. Guo, C., et al.: SA-unet: Spatial attention u-net for retinal vessel segmentation. In: Proceedings of ICPR, pp. 1236–1242. IEEE (2021)

    Google Scholar 

  13. Hang, W., et al.: Local and global structure-aware entropy regularized mean teacher model for 3D left atrium segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 562–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_55

    Chapter  Google Scholar 

  14. Hatamizadeh, A., et al.: Unetr: transformers for 3d medical image segmentation. In: Proceedings of IEEE/CVF WACV (2022)

    Google Scholar 

  15. Huang, H., et al.: Unet 3+: a full-scale connected unet for medical image segmentation. In: Proceedings of ICASSP, pp. 1055–1059. IEEE (2020)

    Google Scholar 

  16. Isensee, F., et al.: nnu-net: a self-configuring method for segmentation. Nat. Methods (2021)

    Google Scholar 

  17. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint (2014)

    Google Scholar 

  18. Kumar, N., et al.: A multi-organ nucleus segmentation challenge. IEEE Trans. Med. Imaging 39(5), 1380–1391 (2020). https://doi.org/10.1109/TMI.2019.2947628

    Article  Google Scholar 

  19. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv:1610.02242 (2016)

  20. Li, B., et al.: Language-driven semantic segmentation. arXiv preprint arXiv:2201.03546 (2022)

  21. Li, Y., et al.: GT u-net: a u-net like group transformer network for tooth root segmentation. In: Proceedings of MLMI (2021)

    Google Scholar 

  22. Li, Z., et al.: LVIT: language meets vision transformer in medical image segmentation. IEEE Trans. Med. Imaging (2023)

    Google Scholar 

  23. Liu, Z., et al.: Swin transformer: hierarchical vision transformer. In: Proceedings of IEEE/CVF ICCV (2021)

    Google Scholar 

  24. Lüddecke, T., et al.: Image segmentation using text and image prompts. In: Proceedings of IEEE/CVF CVPR (2022)

    Google Scholar 

  25. Luo, X., et al.: Semi-supervised medical image segmentation via cross teaching. arXiv:2112.04894 (2021)

  26. Luo, X., et al.: Semi-supervised medical image segmentation via uncertainty rectified pyramid consistency. Med. Image Anal. (2022)

    Google Scholar 

  27. Oktay, O., et al.: Attention u-net: learning where to look for the pancreas. arXiv preprint (2018)

    Google Scholar 

  28. Pelka, O., Koitka, S., Rückert, J., Nensa, F., Friedrich, C.M.: Radiology objects in COntext (ROCO): a multimodal image dataset. In: Stoyanov, D., et al. (eds.) LABELS/CVII/STENT -2018. LNCS, vol. 11043, pp. 180–189. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01364-6_20

    Chapter  Google Scholar 

  29. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML (2021)

    Google Scholar 

  30. Rao, Y., et al.: Denseclip: language-guided dense prediction with context-aware prompting. In: Proceedings of IEEE/CVF CVPR (2022)

    Google Scholar 

  31. Ronneberger, O., et al.: U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical image computing and computer-assisted intervention (2015)

    Google Scholar 

  32. Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4

    Chapter  Google Scholar 

  33. Wang, G., et al.: Semi-supervised segmentation with multi-scale guided dense attention. IEEE Trans. Med. Imaging (2021)

    Google Scholar 

  34. Wang, H., et al.: Uctransnet: rethinking the skip connections in u-net with transformer. In: Proceedings of AAAI (2022)

    Google Scholar 

  35. Wang, K., et al.: Tripled-uncertainty guided mean teacher model for segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention (2021)

    Google Scholar 

  36. Wang, Z., et al.: Cris: clip-driven referring image segmentation. In: Proceedings of IEEE/CVF CVPR (2022)

    Google Scholar 

  37. Wu, Y., et al.: Mutual consistency learning for semi-supervised segmentation. Med. Image Anal. (2022)

    Google Scholar 

  38. Xie, Y., Zhang, J., Shen, C., Xia, Y.: CoTr: efficiently bridging CNN and transformer for 3D medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 171–180. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_16

    Chapter  Google Scholar 

  39. Xu, M., et al.: A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model. arXiv preprint (2021)

    Google Scholar 

  40. You, C., et al.: SimCVD: contrastive voxel-wise representation distillation for semi-supervised medical image segmentation. IEEE Trans. Med. Imaging (2022)

    Google Scholar 

  41. Zhang, Y., et al.: A multi-branch hybrid transformer network for corneal endothelial cell segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 99–108. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_10

    Chapter  Google Scholar 

  42. Zhang, Y., Liu, H., Hu, Q.: TransFuse: fusing transformers and CNNs for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 14–24. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_2

    Chapter  Google Scholar 

  43. Zhou, Y., et al.: Semi-supervised multi-organ segmentation via deep multi-planar co-training. arXiv preprint (2018)

    Google Scholar 

  44. Zhou, Z., et al.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, Y., Fan, Z., Xu, M. (2024). Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14555. Springer, Cham. https://doi.org/10.1007/978-3-031-53308-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53308-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53307-5

  • Online ISBN: 978-3-031-53308-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics