Skip to main content

Advertisement

Log in

Double U-Net: semi-supervised ultrasound image segmentation combining CNN and transformer’s U-shaped network

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Ultrasound image segmentation remains challenging due to blurred boundaries and morphological heterogeneity, while existing deep learning methods heavily rely on costly expert annotations. To address these issues, this study proposes a semi-supervised learning algorithm called Double U-Net (W-Net), built on consistency regularization and a cross-teaching framework. Specifically, we introduce a Deeper Dual-output Fusion U-Net (DDFU-Net) designed to tackle ultrasound-specific challenges. This architecture enhances multi-scale feature extraction by improving the backbone network, integrating a dual-output refinement (DOR) module and incorporating a spatial feature calibration (SFC) module to optimize multi-scale feature fusion. Furthermore, the proposed network combines DDFU-Net with a lightweight Transformer, enabling CNNs and Transformers to complement each other in local and global feature extraction. Through mutual end-to-end supervision, the method effectively leverages unlabeled data. Our method achieves competitive performance: (1) Compared to other semi-supervised methods, it outperforms the second-best by 7.96% (BUSI, 20% labels) and 17.52% (10% labels), with 5.47% (GCUI, 20%) and 6.08% (GCUI, 10%) improvements; and (2) compared to fully supervised U-Net, it elevates Dice by 6.09%/3.86% (BUSI) and 3.89%/4.42% (GCUI) under 10%/20% labels condition, proving the ability to effectively leverage unlabeled data, extracting rich feature information to enhance model interpretability of complex medical images, particularly in low-data scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

No datasets were generated or analyzed during the current study.

References

  1. Rehman A, Butt MA, Zaman M (2021) A survey of medical image analysis using deep learning approaches. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC). IEEE, pp 1334–1342

  2. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88

    Google Scholar 

  3. Zeng L, Wang B, Liu X, Wu J, Deng L, Yuan M, Chen Y, Deng Y, Zhang Y, Ji X (2021) High-resolution air-coupled laser ultrasound imaging of microstructure and defects in braided cfrp. Compos Commun 28:100915

    MATH  Google Scholar 

  4. Keskinoğlu C, Aydın A (2021) Ultrasound based noninvasive real-time cell proliferation process monitoring. J Acoust Soc Am 149(5):3345–3351

    MATH  Google Scholar 

  5. Meiburger KM, Acharya UR, Molinari F (2018) Automated localization and segmentation techniques for b-mode ultrasound images: a review. Comput Biol Med 92:210–235

    MATH  Google Scholar 

  6. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer, pp 234–241

  7. Liu L, Cheng J, Quan Q, Wu F-X, Wang Y-P, Wang J (2020) A survey on u-shaped networks in medical image segmentations. Neurocomputing 409:244–258

    MATH  Google Scholar 

  8. Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH (2021) nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18(2):203–211

    Google Scholar 

  9. Jiao R, Zhang Y, Ding L, Xue B, Zhang J, Cai R, Jin C (2023) Learning with limited annotations: a survey on deep semi-supervised learning for medical image segmentation. Comput Biol Med 169:107840

    MATH  Google Scholar 

  10. Müller-Franzes G, Niehues JM, Khader F, Arasteh ST, Haarburger C, Kuhl C, Wang T, Han T, Nolte T, Nebelung S et al (2023) A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Sci Rep 13(1):12098

    Google Scholar 

  11. Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242

  12. Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst 30

  13. Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, Cubuk ED, Kurakin A, Li C-L (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv Neural Inf Process Syst 33:596–608

    Google Scholar 

  14. Luo X, Hu M, Song T, Wang G, Zhang S (2022) Semi-supervised medical image segmentation via cross teaching between cnn and transformer. In: International Conference on Medical Imaging with Deep Learning. PMLR, pp 820–833

  15. Li W, Yang H (2022) Collaborative transformer-cnn learning for semi-supervised medical image segmentation. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp 1058–1065

  16. Li K, Zhang G, Li K, Li J, Wang J, Yang Y (2023) Dual cnn cross-teaching semi-supervised segmentation network with multi-kernels and global contrastive loss in acdc. Med Biol Eng Comput 61(12):3409–3417

    MATH  Google Scholar 

  17. Yap MH, Goyal M, Osman F, Martí R, Denton E, Juette A, Zwiggelaar R (2020) Breast ultrasound region of interest detection and lesion localisation. Artif Intell Med 107:101880

    Google Scholar 

  18. Ibtehaz N, Rahman MS (2020) Multiresunet: rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87

    Google Scholar 

  19. Hu K, Zhang X, Lee D, Xiong D, Zhang Y, Gao X (2023) Boundary-guided and region-aware network with global scale-adaptive for accurate segmentation of breast tumors in ultrasound images. IEEE J Biomed Health Inform 27:4421–4432

    MATH  Google Scholar 

  20. Wang B, Qin J, Lv L, Cheng M, Li L, Xia D, Wang S (2023) Mlkca-unet: multiscale large-kernel convolution and attention in unet for spine mri segmentation. Optik 272:170277

    Google Scholar 

  21. Tang F, Wang L, Ning C, Xian M, Ding J (2023) Cmu-net: a strong convmixer-based medical ultrasound image segmentation network. In: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). IEEE, pp 1–5

  22. Azad R, Khosravi N, Merhof D (2022) Smu-net: style matching u-net for brain tumor segmentation with missing modalities. In: International Conference on Medical Imaging with Deep Learning. PMLR, pp 48–62

  23. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv. Neural Inf Process Syst 27

  24. Yan S, Wang C, Chen W, Lyu J (2022) Swin transformer-based gan for multi-modal medical image translation. Front Oncol 12:942511

    MATH  Google Scholar 

  25. Yu L, Wang S, Li X, Fu C-W, Heng P-A (2019) Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22. Springer, pp 605–613

  26. Lee D-H, et al (2013) Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, 3, 896. Atlanta

  27. Zhao X, Qi Z, Wang S, Wang Q, Wu X, Mao Y, Zhang L (2023) Rcps: rectified contrastive pseudo supervision for semi-supervised medical image segmentation. IEEE J Biomed Health Inform 28(1):251–261

    MATH  Google Scholar 

  28. Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision. Springer, pp 205–218

  29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  30. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 801–818

  31. Yang H, Huang W, Qi K, Li C, Liu X, Wang M, Zheng H, Wang S (2019) Clci-net: cross-level fusion and context inference networks for lesion segmentation of chronic stroke. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22. Springer, pp 266–274

  32. Hu H, Chen Y, Xu J, Borse S, Cai H, Porikli F, Wang X (2022) Learning implicit feature alignment function for semantic segmentation. In: European Conference on Computer Vision. Springer, pp 487–505

  33. Huang Z, Wei Y, Wang X, Liu W, Huang TS, Shi H (2021) Alignseg: feature-aligned segmentation networks. IEEE Trans Pattern Anal Mach Intell 44(1):550–557

    MATH  Google Scholar 

  34. Cvetko T (2021) Agd-autoencoder: attention gated deep convolutional autoencoder for brain tumor segmentation. arXiv preprint arXiv:2107.03323

  35. Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen L-C (2020) Axial-deeplab: stand-alone axial-attention for panoptic segmentation. In: European Conference on Computer Vision. Springer, pp 108–126

  36. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 248–255

  37. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022

  38. Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101

  39. Wang G, Zhai S, Lasio G, Zhang B, Yi B, Chen S, Macvittie TJ, Metaxas D, Zhou J, Zhang S (2021) Semi-supervised segmentation of radiation-induced pulmonary fibrosis from lung ct scans with multi-scale guided dense attention. IEEE Trans Med Imaging 41(3):531–542

    Google Scholar 

  40. Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A (2020) Dataset of breast ultrasound images. Data Brief 28:104863

    MATH  Google Scholar 

  41. Wu Y, Ge Z, Zhang D, Xu M, Zhang L, Xia Y, Cai J (2022) Mutual consistency learning for semi-supervised medical image segmentation. Med Image Anal 81:102530

    MATH  Google Scholar 

  42. Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer, pp 3–11

  43. Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto, Y, Han X, Chen Y-W, Wu J (2020) Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1055–1059

Download references

Acknowledgements

This work was supported by the Joint Funds for the Innovation of Science and Technology, Fujian province under Grant No. 2023Y9123.

Author information

Authors and Affiliations

Authors

Contributions

Huabiao Zhou, Yanmin Luo and Jingjing Guo conceptualized and designed the study. Jingjing Guo, Zhikui Chen, Minling Zhuo, Youjia Lin, Weiwei Lin and Qingling Shen collected and annotated the dataset. Huabiao Zhou carried out the experiments and collected additional data. Huabiao Zhou, Wanyuan Gong and Zhongwei Lin conducted the data analysis and interpreted the results. Huabiao Zhou wrote the initial manuscript draft, and Yanmin Luo, Jingjing Guo and Zhikui Chen reviewed and edited the manuscript. Funding was provided by Jingjing Guo, Zhikui Chen, Minling Zhuo and Qingling Shen, who also supervised the project.

Corresponding author

Correspondence to Yanmin Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This study was approved by the Ethics Committee of Fujian Medical University Union Hospital, with the approval number 2023KY115. The research was conducted using anonymized medical ultrasound images, adhering to data privacy and security standards.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

Appendix 1

To enhance the methodological transparency and reproducibility, this appendix provides complementary module detail (Tables 5 and 6) and pseudo-code (Algorithm 1) for our framework.

As shown in Table 5, the detailed description of the tensor transformations in DOR modules at different depths is provided. The tensor transformations in the DOR modules exhibit a symmetric pattern: They first undergo downsampling until the tensor reaches the scale of the intermediate fusion layer (4 × 4), after which upsampling begins to further refine the features. Simultaneously, intermediate layer features are output to preserve multi-scale information. The features of the intermediate fusion layer before the decoding stage consist of the intermediate outputs from each DOR module and the final feature \(x_{7}\) from the encoding stage. The tensor of the intermediate fusion layer is \(F_{fusion}=concat([X_{F_1},\ldots ,X_{F_5},x_7])\in \mathbb {R}^{2304\times 4\times 4}\).

The workflow of the SFC (spatial feature calibration) module is illustrated in Table 6. During the decoding phase, two multi-scale features with distinct resolutions are fed into the SFC module. The convolutional layers first compute spatial offsets for both low-resolution and high-resolution features. These offsets are then utilized to perform grid sampling-based correction on the corresponding features. Furthermore, the gated adaptive fusion component activates horizontal and vertical feature offsets to enable context-aware refinement. Ultimately, the calibrated features are generated by adaptively fusing the rectified multi-scale representations.

Table 5 DOR modules at different depths
Table 6 Detailed operations of SFC module
Algorithm 1
figure a

W-Net: cross-teaching with CNN and transformer

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, H., Luo, Y., Guo, J. et al. Double U-Net: semi-supervised ultrasound image segmentation combining CNN and transformer’s U-shaped network. J Supercomput 81, 659 (2025). https://doi.org/10.1007/s11227-025-07152-7

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-025-07152-7

Keywords