Abstract
Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning. Recently, Transformer has achieved great success in various medical image segmentation tasks, which has long-range dependence compared to convolution neural network (CNN). However, self-attention in Transformer regards 2D images as 1D sequences, which destroys the crucial 2D structure of the image, which only considers the adaptability in spatial dimension but ignores the adaptability in channel dimension. Transformer can result in limited localization abilities due to insufficient low-level details. To solve these problems, this paper proposes Depth-wise Separable Convolutional Attention U-shape network (DW-SCA Unet), a U-shaped encoding-decoding network for medical image segmentation. The depth-wise separable convolutional attention block absorbs the advantages of depth-wise separable convolution and transformer, including long-range dependence, local structural information and adaptability. Specifically, we use a depth-wise separable convolutional attention block as the encoder to extract contextual features. And a symmetric depth-wise separable convolutional attention block with patch expanding layer is designed to perform up-sampling operation to restore the spatial resolution of the feature maps. On the skip connections of U-shaped networks, we use Channel-wise Cross-Attention (CCA) to guide the fused multi-scale channel-wise information to efficiently connect to decoder features for eliminating the ambiguity. Experimental results showed that DW-SCA Unet achieved better performance in the Synapse multi-organ segmentation task with segmentation accuracies of 80.54%% (DSC) and 20.02% (HD). The experiments on multi-organ and cardiac segmentation tasks also demonstrate the superiority, effectiveness and robustness of our DW-SCA Unet.
Similar content being viewed by others
Data availability
The Synapse datasets that support the findings of this study are available in [repository name “Synapse Storage” at https://doi.org/10.7303/syn3193805[doi], Synapse ID [syn3193805]. The ACDC datasets analysed during the current study are available in the “Automated Cardiac Diagnosis Challenge (ACDC)” repository. The ACDC dataset can be accessed on the Internet (https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html).
References
Alom MZ, Hasan M, Yakopcic C et al (2018) Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation[J]. arXiv preprint arXiv:1802.06955
Cao H, Wang Y, Chen J et al (2023) Swin-unet: Unet-like pure transformer for medical image segmentation[C]//Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, Proceedings, Part III. Springer Nature Switzerland, Cham, pp 205–218
Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Chen J, Lu Y, Yu Q et al (2021) Transunet: transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv:2102.04306
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions[C]//. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251-1258
Çiçek Ö, Abdulkadir A, Lienkamp SS et al (2016) 3D U-Net: learning dense volumetric segmentation from sparse annotation [C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 424-432
Lee J, Toutanova K (2018) Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805
Ding X, Zhang X, Han J et al (2022) Scaling up your kernels to 31x31: Revisiting large kernel design in cnns[C]//. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11963-11975
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929
Gu Z, Cheng J, Fu H et al (2019) Ce-net: context encoder network for 2d medical image segmentation[J]. IEEE Trans Med Imaging 38(10):2281–2292
Guo MH, Lu CZ, Liu Z N et al (2022) Visual attention network[J]. arXiv preprint arXiv:2202.09741
Han Q, Fan Z, Dai Q et al (2021) On the connection between local attention and dynamic depth-wise convolution[C]//. International conference on learning representations
Han K, Wang Y, Chen H et al (2022) A survey on vision transformer[J]. IEEE Trans Pattern Anal Mach Intell 45(1):87–110
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861
Huang H, Lin L, Tong R et al (2020) Unet 3+: a full-scale connected unet for medical image segmentation[C]//. ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1055-1059
Huang Z, Ben Y, Luo G et al (2021) Shuffle transformer: rethinking spatial shuffle for vision transformer[J]. arXiv preprint arXiv:2106.03650
Huang X, Deng Z, Li D et al (2021) Missformer: an effective medical image segmentation transformer[J]. arXiv preprint arXiv:2109.07162
Iandola F, Moskewicz M, Karayev S et al (2014) Densenet: implementing efficient convnet descriptor pyramids[J]. arXiv preprint arXiv:1404.1869
Ibtehaz N, Rahman MS (2020) MultiResUNet: rethinking the U-net architecture for multimodal biomedical image segmentation[J]. Neural Netw 121:74–87
Jha D, Riegler M A, Johansen D et al (2020) Doubleu-net: a deep convolutional neural network for medical image segmentation[C]//. 2020 IEEE 33rd international symposium on computer-based medical systems (CBMS). IEEE, pp 558-564
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: hierarchical vision transformer using shifted windows[C]//. Proceedings of the IEEE/CVF international conference on computer vision, pp 10012-10022
Ma N, Zhang X, Zheng H T et al (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//. Proceedings of the European conference on computer vision (ECCV) pp 116-131
Milletari F, Navab N, Ahmadi S A (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation[C]//. 2016 fourth international conference on 3D vision (3DV). IEEE, pp 565-571
Oktay O, Schlemper J, Folgoc LL et al (2018) Attention u-net: learning where to look for the pancreas[J]. arXiv preprint arXiv:1804.03999
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation [C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 234-241
Schlemper J, Oktay O, Schaap M et al (2019) Attention gated networks: learning to leverage salient regions in medical images[J]. Med Image Anal 53:197–207
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks[C]//. International conference on machine learning. PMLR, pp 6105-6114
Tay Y, Dehghani M, Gupta J et al (2021) Are pre-trained convolutions better than pre-trained transformers?[J]. arXiv preprint arXiv:2105.03322
Tsai A, Yezzi A, Wells W et al (2003) A shape-based approach to the segmentation of medical imagery using level sets[J]. IEEE Trans Med Imaging 22(2):137–154
Valanarasu J M J, Sindagi V A, Hacihaliloglu I et al (2020) Kiu-net: towards accurate segmentation of biomedical images using over-complete representations[C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 363-373
Valanarasu J M J, Oza P, Hacihaliloglu I et al (2021) Medical transformer: gated axial-attention for medical image segmentation[C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 36-46
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need[J]. Adv Neural Inf Process Syst 30
Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks[C]//. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794-7803
Wang Q, Wu B, Zhu P et al (2020) Supplementary material for ‘ECA-net: efficient channel attention for deep convolutional neural networks[C]//. Proceedings of the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, WA, USA, pp 13-19
Wang W, Xie E, Li X et al (2021) Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//. Proceedings of the IEEE/CVF international conference on computer vision, pp 568-578
Wang H, Cao P, Wang J et al (2022) Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer [C]//. Proceedings of the AAAI conference on artificial intelligence. 36(3):2441-2449
Wang Z, Cun X, Bao J et al (2022) Uformer: A general u-shaped transformer for image restoration[C]//. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17683-17693
Xiao X, Lian S, Luo Z et al (2018) Weighted res-unet for high-quality retina vessel segmentation[C]//. 2018 9th international conference on information technology in medicine and education (ITME). IEEE, pp 327-331
Xie J, Zhu R, Wu Z et al (2022) FFUNet: a novel feature fusion makes strong decoder for medical image segmentation[J]. IET Signal Processing 16(5):501–514
Yuan Y, Fu R, Huang L et al (2021) Hrformer: high-resolution vision transformer for dense predict[J]. Adv Neural Inf Proces Syst 34:7281–7293
Zhang T, Qi G J, Xiao B et al (2017) Interleaved group convolutions[C]//. Proceedings of the IEEE international conference on computer vision, pp 4373-4382
Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network[C]//. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881-2890
Zhou Z, Rahman Siddiquee M M, Tajbakhsh N et al (2018) Unet++: a nested u-net architecture for medical image segmentation[M]//. Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, Cham, pp 3-11
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 61602528).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, Y., Tian, W., Zhang, Y. et al. DW-SCA Unet: medical image segmentation based on depth-wise separable convolutional attention U-shaped network. Multimed Tools Appl 83, 8893–8910 (2024). https://doi.org/10.1007/s11042-023-15960-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15960-3