Double U-Net: semi-supervised ultrasound image segmentation combining CNN and transformer’s U-shaped network

Zhou, Huabiao; Luo, Yanmin; Guo, Jingjing; Chen, Zhikui; Gong, Wanyuan; Lin, ZhongWei; Zhuo, Minling; Lin, Youjia; Lin, Weiwei; Shen, Qingling

doi:10.1007/s11227-025-07152-7

Double U-Net: semi-supervised ultrasound image segmentation combining CNN and transformer’s U-shaped network

Published: 25 March 2025

Volume 81, article number 659, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Huabiao Zhou^1,2,
Yanmin Luo^1,2,
Jingjing Guo³^na1,
Zhikui Chen³^na1,
Wanyuan Gong^1,2,
ZhongWei Lin^1,2,
Minling Zhuo³,
Youjia Lin³,
Weiwei Lin⁴ &
…
Qingling Shen³

90 Accesses
Explore all metrics

Abstract

Ultrasound image segmentation remains challenging due to blurred boundaries and morphological heterogeneity, while existing deep learning methods heavily rely on costly expert annotations. To address these issues, this study proposes a semi-supervised learning algorithm called Double U-Net (W-Net), built on consistency regularization and a cross-teaching framework. Specifically, we introduce a Deeper Dual-output Fusion U-Net (DDFU-Net) designed to tackle ultrasound-specific challenges. This architecture enhances multi-scale feature extraction by improving the backbone network, integrating a dual-output refinement (DOR) module and incorporating a spatial feature calibration (SFC) module to optimize multi-scale feature fusion. Furthermore, the proposed network combines DDFU-Net with a lightweight Transformer, enabling CNNs and Transformers to complement each other in local and global feature extraction. Through mutual end-to-end supervision, the method effectively leverages unlabeled data. Our method achieves competitive performance: (1) Compared to other semi-supervised methods, it outperforms the second-best by 7.96% (BUSI, 20% labels) and 17.52% (10% labels), with 5.47% (GCUI, 20%) and 6.08% (GCUI, 10%) improvements; and (2) compared to fully supervised U-Net, it elevates Dice by 6.09%/3.86% (BUSI) and 3.89%/4.42% (GCUI) under 10%/20% labels condition, proving the ability to effectively leverage unlabeled data, extracting rich feature information to enhance model interpretability of complex medical images, particularly in low-data scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fine Tuning U-Net for Ultrasound Image Segmentation: Which Layers?

Adaptive ensemble loss and multi-scale attention in breast ultrasound segmentation with UMA-Net

Article 23 January 2025

Iterative Multi-domain Regularized Deep Learning for Anatomical Structure Detection and Segmentation from Ultrasound Images

Data availability

No datasets were generated or analyzed during the current study.

References

Rehman A, Butt MA, Zaman M (2021) A survey of medical image analysis using deep learning approaches. In: 2021 5th International Conference on Computing Methodologies and Communication (ICCMC). IEEE, pp 1334–1342
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med Image Anal 42:60–88
Google Scholar
Zeng L, Wang B, Liu X, Wu J, Deng L, Yuan M, Chen Y, Deng Y, Zhang Y, Ji X (2021) High-resolution air-coupled laser ultrasound imaging of microstructure and defects in braided cfrp. Compos Commun 28:100915
MATH Google Scholar
Keskinoğlu C, Aydın A (2021) Ultrasound based noninvasive real-time cell proliferation process monitoring. J Acoust Soc Am 149(5):3345–3351
MATH Google Scholar
Meiburger KM, Acharya UR, Molinari F (2018) Automated localization and segmentation techniques for b-mode ultrasound images: a review. Comput Biol Med 92:210–235
MATH Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18. Springer, pp 234–241
Liu L, Cheng J, Quan Q, Wu F-X, Wang Y-P, Wang J (2020) A survey on u-shaped networks in medical image segmentations. Neurocomputing 409:244–258
MATH Google Scholar
Isensee F, Jaeger PF, Kohl SA, Petersen J, Maier-Hein KH (2021) nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 18(2):203–211
Google Scholar
Jiao R, Zhang Y, Ding L, Xue B, Zhang J, Cai R, Jin C (2023) Learning with limited annotations: a survey on deep semi-supervised learning for medical image segmentation. Comput Biol Med 169:107840
MATH Google Scholar
Müller-Franzes G, Niehues JM, Khader F, Arasteh ST, Haarburger C, Kuhl C, Wang T, Han T, Nolte T, Nebelung S et al (2023) A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Sci Rep 13(1):12098
Google Scholar
Laine S, Aila T (2016) Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242
Tarvainen A, Valpola H (2017) Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. Adv Neural Inf Process Syst 30
Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, Cubuk ED, Kurakin A, Li C-L (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv Neural Inf Process Syst 33:596–608
Google Scholar
Luo X, Hu M, Song T, Wang G, Zhang S (2022) Semi-supervised medical image segmentation via cross teaching between cnn and transformer. In: International Conference on Medical Imaging with Deep Learning. PMLR, pp 820–833
Li W, Yang H (2022) Collaborative transformer-cnn learning for semi-supervised medical image segmentation. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp 1058–1065
Li K, Zhang G, Li K, Li J, Wang J, Yang Y (2023) Dual cnn cross-teaching semi-supervised segmentation network with multi-kernels and global contrastive loss in acdc. Med Biol Eng Comput 61(12):3409–3417
MATH Google Scholar
Yap MH, Goyal M, Osman F, Martí R, Denton E, Juette A, Zwiggelaar R (2020) Breast ultrasound region of interest detection and lesion localisation. Artif Intell Med 107:101880
Google Scholar
Ibtehaz N, Rahman MS (2020) Multiresunet: rethinking the u-net architecture for multimodal biomedical image segmentation. Neural Netw 121:74–87
Google Scholar
Hu K, Zhang X, Lee D, Xiong D, Zhang Y, Gao X (2023) Boundary-guided and region-aware network with global scale-adaptive for accurate segmentation of breast tumors in ultrasound images. IEEE J Biomed Health Inform 27:4421–4432
MATH Google Scholar
Wang B, Qin J, Lv L, Cheng M, Li L, Xia D, Wang S (2023) Mlkca-unet: multiscale large-kernel convolution and attention in unet for spine mri segmentation. Optik 272:170277
Google Scholar
Tang F, Wang L, Ning C, Xian M, Ding J (2023) Cmu-net: a strong convmixer-based medical ultrasound image segmentation network. In: 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI). IEEE, pp 1–5
Azad R, Khosravi N, Merhof D (2022) Smu-net: style matching u-net for brain tumor segmentation with missing modalities. In: International Conference on Medical Imaging with Deep Learning. PMLR, pp 48–62
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv. Neural Inf Process Syst 27
Yan S, Wang C, Chen W, Lyu J (2022) Swin transformer-based gan for multi-modal medical image translation. Front Oncol 12:942511
MATH Google Scholar
Yu L, Wang S, Li X, Fu C-W, Heng P-A (2019) Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22. Springer, pp 605–613
Lee D-H, et al (2013) Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, 3, 896. Atlanta
Zhao X, Qi Z, Wang S, Wang Q, Wu X, Mao Y, Zhang L (2023) Rcps: rectified contrastive pseudo supervision for semi-supervised medical image segmentation. IEEE J Biomed Health Inform 28(1):251–261
MATH Google Scholar
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2022) Swin-unet: Unet-like pure transformer for medical image segmentation. In: European Conference on Computer Vision. Springer, pp 205–218
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder–decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 801–818
Yang H, Huang W, Qi K, Li C, Liu X, Wang M, Zheng H, Wang S (2019) Clci-net: cross-level fusion and context inference networks for lesion segmentation of chronic stroke. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22. Springer, pp 266–274
Hu H, Chen Y, Xu J, Borse S, Cai H, Porikli F, Wang X (2022) Learning implicit feature alignment function for semantic segmentation. In: European Conference on Computer Vision. Springer, pp 487–505
Huang Z, Wei Y, Wang X, Liu W, Huang TS, Shi H (2021) Alignseg: feature-aligned segmentation networks. IEEE Trans Pattern Anal Mach Intell 44(1):550–557
MATH Google Scholar
Cvetko T (2021) Agd-autoencoder: attention gated deep convolutional autoencoder for brain tumor segmentation. arXiv preprint arXiv:2107.03323
Wang H, Zhu Y, Green B, Adam H, Yuille A, Chen L-C (2020) Axial-deeplab: stand-alone axial-attention for panoptic segmentation. In: European Conference on Computer Vision. Springer, pp 108–126
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 248–255
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101
Wang G, Zhai S, Lasio G, Zhang B, Yi B, Chen S, Macvittie TJ, Metaxas D, Zhou J, Zhang S (2021) Semi-supervised segmentation of radiation-induced pulmonary fibrosis from lung ct scans with multi-scale guided dense attention. IEEE Trans Med Imaging 41(3):531–542
Google Scholar
Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A (2020) Dataset of breast ultrasound images. Data Brief 28:104863
MATH Google Scholar
Wu Y, Ge Z, Zhang D, Xu M, Zhang L, Xia Y, Cai J (2022) Mutual consistency learning for semi-supervised medical image segmentation. Med Image Anal 81:102530
MATH Google Scholar
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. Springer, pp 3–11
Huang H, Lin L, Tong R, Hu H, Zhang Q, Iwamoto, Y, Han X, Chen Y-W, Wu J (2020) Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1055–1059

Download references

Acknowledgements

This work was supported by the Joint Funds for the Innovation of Science and Technology, Fujian province under Grant No. 2023Y9123.

Author information

Jingjing Guo and Zhikui Chen have contributed equally to this work.

Authors and Affiliations

College of Computer Science and Technology, Huaqiao University, Jimei, Xiamen, 361021, Fujian, China
Huabiao Zhou, Yanmin Luo, Wanyuan Gong & ZhongWei Lin
Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Jimei, Xiamen, 361021, Fujian, China
Huabiao Zhou, Yanmin Luo, Wanyuan Gong & ZhongWei Lin
Department of Ultrasound, Fujian Medical University Union Hospital, No.29 Xinquan Road, Fuzhou, 350001, Fujian, China
Jingjing Guo, Zhikui Chen, Minling Zhuo, Youjia Lin & Qingling Shen
Department of Ultrasound, The Third Affiliated People’S Hospital of Fujian University of Traditional Chinese Medicine, 363 Guobing Avenue, Fuzhou, 350122, Fujian, China
Weiwei Lin

Authors

Huabiao Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Yanmin Luo
View author publications
You can also search for this author inPubMed Google Scholar
Jingjing Guo
View author publications
You can also search for this author inPubMed Google Scholar
Zhikui Chen
View author publications
You can also search for this author inPubMed Google Scholar
Wanyuan Gong
View author publications
You can also search for this author inPubMed Google Scholar
ZhongWei Lin
View author publications
You can also search for this author inPubMed Google Scholar
Minling Zhuo
View author publications
You can also search for this author inPubMed Google Scholar
Youjia Lin
View author publications
You can also search for this author inPubMed Google Scholar
Weiwei Lin
View author publications
You can also search for this author inPubMed Google Scholar
Qingling Shen
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Huabiao Zhou, Yanmin Luo and Jingjing Guo conceptualized and designed the study. Jingjing Guo, Zhikui Chen, Minling Zhuo, Youjia Lin, Weiwei Lin and Qingling Shen collected and annotated the dataset. Huabiao Zhou carried out the experiments and collected additional data. Huabiao Zhou, Wanyuan Gong and Zhongwei Lin conducted the data analysis and interpreted the results. Huabiao Zhou wrote the initial manuscript draft, and Yanmin Luo, Jingjing Guo and Zhikui Chen reviewed and edited the manuscript. Funding was provided by Jingjing Guo, Zhikui Chen, Minling Zhuo and Qingling Shen, who also supervised the project.

Corresponding author

Correspondence to Yanmin Luo.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This study was approved by the Ethics Committee of Fujian Medical University Union Hospital, with the approval number 2023KY115. The research was conducted using anonymized medical ultrasound images, adhering to data privacy and security standards.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

To enhance the methodological transparency and reproducibility, this appendix provides complementary module detail (Tables 5 and 6) and pseudo-code (Algorithm 1) for our framework.

As shown in Table 5, the detailed description of the tensor transformations in DOR modules at different depths is provided. The tensor transformations in the DOR modules exhibit a symmetric pattern: They first undergo downsampling until the tensor reaches the scale of the intermediate fusion layer (4 × 4), after which upsampling begins to further refine the features. Simultaneously, intermediate layer features are output to preserve multi-scale information. The features of the intermediate fusion layer before the decoding stage consist of the intermediate outputs from each DOR module and the final feature $x_{7}$ from the encoding stage. The tensor of the intermediate fusion layer is $F_{fusion}=concat([X_{F_1},\ldots ,X_{F_5},x_7])\in \mathbb {R}^{2304\times 4\times 4}$.

The workflow of the SFC (spatial feature calibration) module is illustrated in Table 6. During the decoding phase, two multi-scale features with distinct resolutions are fed into the SFC module. The convolutional layers first compute spatial offsets for both low-resolution and high-resolution features. These offsets are then utilized to perform grid sampling-based correction on the corresponding features. Furthermore, the gated adaptive fusion component activates horizontal and vertical feature offsets to enable context-aware refinement. Ultimately, the calibrated features are generated by adaptively fusing the rectified multi-scale representations.

Table 5 DOR modules at different depths

Full size table

Table 6 Detailed operations of SFC module

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, H., Luo, Y., Guo, J. et al. Double U-Net: semi-supervised ultrasound image segmentation combining CNN and transformer’s U-shaped network. J Supercomput 81, 659 (2025). https://doi.org/10.1007/s11227-025-07152-7

Download citation

Accepted: 04 March 2025
Published: 25 March 2025
DOI: https://doi.org/10.1007/s11227-025-07152-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Double U-Net: semi-supervised ultrasound image segmentation combining CNN and transformer’s U-shaped network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fine Tuning U-Net for Ultrasound Image Segmentation: Which Layers?

Adaptive ensemble loss and multi-scale attention in breast ultrasound segmentation with UMA-Net

Iterative Multi-domain Regularized Deep Learning for Anatomical Structure Detection and Segmentation from Ultrasound Images

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Appendix 1

Appendix 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now