Abstract
In this paper, we introduce a novel semi-supervised learning framework tailored for medical image segmentation. Central to our approach is the innovative Multi-scale Text-aware ViT-CNN Fusion scheme. This scheme adeptly combines the strengths of both ViTs and CNNs, capitalizing on the unique advantages of both architectures as well as the complementary information in vision-language modalities. Further enriching our framework, we propose the Multi-Axis Consistency framework for generating robust pseudo labels, thereby enhancing the semi-supervised learning process. Our extensive experiments on several widely-used datasets unequivocally demonstrate the efficacy of our approach.
Z. Fan—Equal Contribution.
We thank Bowen Wei for helpful discussions on this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ali, A., et al.: Xcit: cross-covariance image transformers. In: Advances in NeurIPS (2021)
Alsentzer, E., et al.: Publicly available clinical BERT embeddings. arXiv preprint (2019)
Baker, N., et al.: Local features and global shape information in object classification by deep convolutional neural networks. Vision. Res. 172, 46–61 (2020)
Cai, S., et al.: Dense-unet: a novel multiphoton in vivo cellular image segmentation model based on a convolutional neural network. Quant. Imaging Med. Surg. 10(6), 1275 (2020)
Cao, H., et al.: Swin-unet: Unet-like pure transformer for medical image segmentation. arXiv:2105.05537 (2021)
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of IEEE/CVF ICCV (2021)
Chen, J., et al.: Transunet: transformers for medical image segmentation. arXiv:2102.04306 (2021)
Chen, X., et al.: Semi-supervised segmentation with cross pseudo supervision. In: Proceedings of IEEE/CVF CVPR (2021)
Degerli, A., et al.: OSEGnet: operational segmentation network for COVID-19 detection using chest x-ray images. In: Proceedings of ICIP, pp. 2306–2310. IEEE (2022)
Dosovitskiy, A., et al.: Transformers for image recognition at scale. arXiv:2010.11929 (2020)
Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 61–71. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_6
Guo, C., et al.: SA-unet: Spatial attention u-net for retinal vessel segmentation. In: Proceedings of ICPR, pp. 1236–1242. IEEE (2021)
Hang, W., et al.: Local and global structure-aware entropy regularized mean teacher model for 3D left atrium segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 562–571. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_55
Hatamizadeh, A., et al.: Unetr: transformers for 3d medical image segmentation. In: Proceedings of IEEE/CVF WACV (2022)
Huang, H., et al.: Unet 3+: a full-scale connected unet for medical image segmentation. In: Proceedings of ICASSP, pp. 1055–1059. IEEE (2020)
Isensee, F., et al.: nnu-net: a self-configuring method for segmentation. Nat. Methods (2021)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint (2014)
Kumar, N., et al.: A multi-organ nucleus segmentation challenge. IEEE Trans. Med. Imaging 39(5), 1380–1391 (2020). https://doi.org/10.1109/TMI.2019.2947628
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv:1610.02242 (2016)
Li, B., et al.: Language-driven semantic segmentation. arXiv preprint arXiv:2201.03546 (2022)
Li, Y., et al.: GT u-net: a u-net like group transformer network for tooth root segmentation. In: Proceedings of MLMI (2021)
Li, Z., et al.: LVIT: language meets vision transformer in medical image segmentation. IEEE Trans. Med. Imaging (2023)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer. In: Proceedings of IEEE/CVF ICCV (2021)
Lüddecke, T., et al.: Image segmentation using text and image prompts. In: Proceedings of IEEE/CVF CVPR (2022)
Luo, X., et al.: Semi-supervised medical image segmentation via cross teaching. arXiv:2112.04894 (2021)
Luo, X., et al.: Semi-supervised medical image segmentation via uncertainty rectified pyramid consistency. Med. Image Anal. (2022)
Oktay, O., et al.: Attention u-net: learning where to look for the pancreas. arXiv preprint (2018)
Pelka, O., Koitka, S., Rückert, J., Nensa, F., Friedrich, C.M.: Radiology objects in COntext (ROCO): a multimodal image dataset. In: Stoyanov, D., et al. (eds.) LABELS/CVII/STENT -2018. LNCS, vol. 11043, pp. 180–189. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01364-6_20
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: Proceedings of ICML (2021)
Rao, Y., et al.: Denseclip: language-guided dense prediction with context-aware prompting. In: Proceedings of IEEE/CVF CVPR (2022)
Ronneberger, O., et al.: U-net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical image computing and computer-assisted intervention (2015)
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4
Wang, G., et al.: Semi-supervised segmentation with multi-scale guided dense attention. IEEE Trans. Med. Imaging (2021)
Wang, H., et al.: Uctransnet: rethinking the skip connections in u-net with transformer. In: Proceedings of AAAI (2022)
Wang, K., et al.: Tripled-uncertainty guided mean teacher model for segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention (2021)
Wang, Z., et al.: Cris: clip-driven referring image segmentation. In: Proceedings of IEEE/CVF CVPR (2022)
Wu, Y., et al.: Mutual consistency learning for semi-supervised segmentation. Med. Image Anal. (2022)
Xie, Y., Zhang, J., Shen, C., Xia, Y.: CoTr: efficiently bridging CNN and transformer for 3D medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 171–180. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_16
Xu, M., et al.: A simple baseline for zero-shot semantic segmentation with pre-trained vision-language model. arXiv preprint (2021)
You, C., et al.: SimCVD: contrastive voxel-wise representation distillation for semi-supervised medical image segmentation. IEEE Trans. Med. Imaging (2022)
Zhang, Y., et al.: A multi-branch hybrid transformer network for corneal endothelial cell segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 99–108. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_10
Zhang, Y., Liu, H., Hu, Q.: TransFuse: fusing transformers and CNNs for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 14–24. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_2
Zhou, Y., et al.: Semi-supervised multi-organ segmentation via deep multi-planar co-training. arXiv preprint (2018)
Zhou, Z., et al.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lu, Y., Fan, Z., Xu, M. (2024). Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14555. Springer, Cham. https://doi.org/10.1007/978-3-031-53308-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-53308-2_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53307-5
Online ISBN: 978-3-031-53308-2
eBook Packages: Computer ScienceComputer Science (R0)