Abstract
Ultrasound image segmentation remains challenging due to blurred boundaries and morphological heterogeneity, while existing deep learning methods heavily rely on costly expert annotations. To address these issues, this study proposes a semi-supervised learning algorithm called Double U-Net (W-Net), built on consistency regularization and a cross-teaching framework. Specifically, we introduce a Deeper Dual-output Fusion U-Net (DDFU-Net) designed to tackle ultrasound-specific challenges. This architecture enhances multi-scale feature extraction by improving the backbone network, integrating a dual-output refinement (DOR) module and incorporating a spatial feature calibration (SFC) module to optimize multi-scale feature fusion. Furthermore, the proposed network combines DDFU-Net with a lightweight Transformer, enabling CNNs and Transformers to complement each other in local and global feature extraction. Through mutual end-to-end supervision, the method effectively leverages unlabeled data. Our method achieves competitive performance: (1) Compared to other semi-supervised methods, it outperforms the second-best by 7.96% (BUSI, 20% labels) and 17.52% (10% labels), with 5.47% (GCUI, 20%) and 6.08% (GCUI, 10%) improvements; and (2) compared to fully supervised U-Net, it elevates Dice by 6.09%/3.86% (BUSI) and 3.89%/4.42% (GCUI) under 10%/20% labels condition, proving the ability to effectively leverage unlabeled data, extracting rich feature information to enhance model interpretability of complex medical images, particularly in low-data scenarios.






Similar content being viewed by others
Data availability
No datasets were generated or analyzed during the current study.
References
Rehman A, Butt MA, Zaman M (2021) A survey of medical image analysis using deep learning approaches. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC). IEEE, pp 1334–1342
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88
Zeng L, Wang B, Liu X, Wu J, Deng L, Yuan M, Chen Y, Deng Y, Zhang Y, Ji X (2021) High-resolution air-coupled laser ultrasound imaging of microstructure and defects in braided cfrp. Compos Commun 28:100915
Keskinoğlu C, Aydın A (2021) Ultrasound based noninvasive real-time cell proliferation process monitoring. J Acoust Soc Am 149(5):3345–3351
Meiburger KM, Acharya UR, Molinari F (2018) Automated localization and segmentation techniques for b-mode ultrasound images: a review. Comput Biol Med 92:210–235
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer, pp 234–241
Liu L, Cheng J, Quan Q, Wu F-X, Wang Y-P, Wang J (2020) A survey on u-shaped networks in medical image segmentations. Neurocomputing 409:244–258
Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH (2021) nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18(2):203–211
Jiao R, Zhang Y, Ding L, Xue B, Zhang J, Cai R, Jin C (2023) Learning with limited annotations: a survey on deep semi-supervised learning for medical image segmentation. Comput Biol Med 169:107840
Müller-Franzes G, Niehues JM, Khader F, Arasteh ST, Haarburger C, Kuhl C, Wang T, Han T, Nolte T, Nebelung S et al (2023) A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Sci Rep 13(1):12098
Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242
Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst 30
Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, Cubuk ED, Kurakin A, Li C-L (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv Neural Inf Process Syst 33:596–608
Luo X, Hu M, Song T, Wang G, Zhang S (2022) Semi-supervised medical image segmentation via cross teaching between cnn and transformer. In: International Conference on Medical Imaging with Deep Learning. PMLR, pp 820–833
Li W, Yang H (2022) Collaborative transformer-cnn learning for semi-supervised medical image segmentation. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp 1058–1065
Li K, Zhang G, Li K, Li J, Wang J, Yang Y (2023) Dual cnn cross-teaching semi-supervised segmentation network with multi-kernels and global contrastive loss in acdc. Med Biol Eng Comput 61(12):3409–3417
Yap MH, Goyal M, Osman F, Martí R, Denton E, Juette A, Zwiggelaar R (2020) Breast ultrasound region of interest detection and lesion localisation. Artif Intell Med 107:101880
Ibtehaz N, Rahman MS (2020) Multiresunet: rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87
Hu K, Zhang X, Lee D, Xiong D, Zhang Y, Gao X (2023) Boundary-guided and region-aware network with global scale-adaptive for accurate segmentation of breast tumors in ultrasound images. IEEE J Biomed Health Inform 27:4421–4432
Wang B, Qin J, Lv L, Cheng M, Li L, Xia D, Wang S (2023) Mlkca-unet: multiscale large-kernel convolution and attention in unet for spine mri segmentation. Optik 272:170277
Tang F, Wang L, Ning C, Xian M, Ding J (2023) Cmu-net: a strong convmixer-based medical ultrasound image segmentation network. In: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). IEEE, pp 1–5
Azad R, Khosravi N, Merhof D (2022) Smu-net: style matching u-net for brain tumor segmentation with missing modalities. In: International Conference on Medical Imaging with Deep Learning. PMLR, pp 48–62
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv. Neural Inf Process Syst 27
Yan S, Wang C, Chen W, Lyu J (2022) Swin transformer-based gan for multi-modal medical image translation. Front Oncol 12:942511
Yu L, Wang S, Li X, Fu C-W, Heng P-A (2019) Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22. Springer, pp 605–613
Lee D-H, et al (2013) Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, 3, 896. Atlanta
Zhao X, Qi Z, Wang S, Wang Q, Wu X, Mao Y, Zhang L (2023) Rcps: rectified contrastive pseudo supervision for semi-supervised medical image segmentation. IEEE J Biomed Health Inform 28(1):251–261
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision. Springer, pp 205–218
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 801–818
Yang H, Huang W, Qi K, Li C, Liu X, Wang M, Zheng H, Wang S (2019) Clci-net: cross-level fusion and context inference networks for lesion segmentation of chronic stroke. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22. Springer, pp 266–274
Hu H, Chen Y, Xu J, Borse S, Cai H, Porikli F, Wang X (2022) Learning implicit feature alignment function for semantic segmentation. In: European Conference on Computer Vision. Springer, pp 487–505
Huang Z, Wei Y, Wang X, Liu W, Huang TS, Shi H (2021) Alignseg: feature-aligned segmentation networks. IEEE Trans Pattern Anal Mach Intell 44(1):550–557
Cvetko T (2021) Agd-autoencoder: attention gated deep convolutional autoencoder for brain tumor segmentation. arXiv preprint arXiv:2107.03323
Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen L-C (2020) Axial-deeplab: stand-alone axial-attention for panoptic segmentation. In: European Conference on Computer Vision. Springer, pp 108–126
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 248–255
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101
Wang G, Zhai S, Lasio G, Zhang B, Yi B, Chen S, Macvittie TJ, Metaxas D, Zhou J, Zhang S (2021) Semi-supervised segmentation of radiation-induced pulmonary fibrosis from lung ct scans with multi-scale guided dense attention. IEEE Trans Med Imaging 41(3):531–542
Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A (2020) Dataset of breast ultrasound images. Data Brief 28:104863
Wu Y, Ge Z, Zhang D, Xu M, Zhang L, Xia Y, Cai J (2022) Mutual consistency learning for semi-supervised medical image segmentation. Med Image Anal 81:102530
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer, pp 3–11
Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto, Y, Han X, Chen Y-W, Wu J (2020) Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1055–1059
Acknowledgements
This work was supported by the Joint Funds for the Innovation of Science and Technology, Fujian province under Grant No. 2023Y9123.
Author information
Authors and Affiliations
Contributions
Huabiao Zhou, Yanmin Luo and Jingjing Guo conceptualized and designed the study. Jingjing Guo, Zhikui Chen, Minling Zhuo, Youjia Lin, Weiwei Lin and Qingling Shen collected and annotated the dataset. Huabiao Zhou carried out the experiments and collected additional data. Huabiao Zhou, Wanyuan Gong and Zhongwei Lin conducted the data analysis and interpreted the results. Huabiao Zhou wrote the initial manuscript draft, and Yanmin Luo, Jingjing Guo and Zhikui Chen reviewed and edited the manuscript. Funding was provided by Jingjing Guo, Zhikui Chen, Minling Zhuo and Qingling Shen, who also supervised the project.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical approval
This study was approved by the Ethics Committee of Fujian Medical University Union Hospital, with the approval number 2023KY115. The research was conducted using anonymized medical ultrasound images, adhering to data privacy and security standards.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1
Appendix 1
To enhance the methodological transparency and reproducibility, this appendix provides complementary module detail (Tables 5 and 6) and pseudo-code (Algorithm 1) for our framework.
As shown in Table 5, the detailed description of the tensor transformations in DOR modules at different depths is provided. The tensor transformations in the DOR modules exhibit a symmetric pattern: They first undergo downsampling until the tensor reaches the scale of the intermediate fusion layer (4 × 4), after which upsampling begins to further refine the features. Simultaneously, intermediate layer features are output to preserve multi-scale information. The features of the intermediate fusion layer before the decoding stage consist of the intermediate outputs from each DOR module and the final feature \(x_{7}\) from the encoding stage. The tensor of the intermediate fusion layer is \(F_{fusion}=concat([X_{F_1},\ldots ,X_{F_5},x_7])\in \mathbb {R}^{2304\times 4\times 4}\).
The workflow of the SFC (spatial feature calibration) module is illustrated in Table 6. During the decoding phase, two multi-scale features with distinct resolutions are fed into the SFC module. The convolutional layers first compute spatial offsets for both low-resolution and high-resolution features. These offsets are then utilized to perform grid sampling-based correction on the corresponding features. Furthermore, the gated adaptive fusion component activates horizontal and vertical feature offsets to enable context-aware refinement. Ultimately, the calibrated features are generated by adaptively fusing the rectified multi-scale representations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, H., Luo, Y., Guo, J. et al. Double U-Net: semi-supervised ultrasound image segmentation combining CNN and transformer’s U-shaped network. J Supercomput 81, 659 (2025). https://doi.org/10.1007/s11227-025-07152-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-025-07152-7