DW-SCA Unet: medical image segmentation based on depth-wise separable convolutional attention U-shaped network

Zhou, Yi; Tian, Wei; Zhang, Yichi; Wang, Chuzheng

doi:10.1007/s11042-023-15960-3

DW-SCA Unet: medical image segmentation based on depth-wise separable convolutional attention U-shaped network

Published: 15 June 2023

Volume 83, pages 8893–8910, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yi Zhou¹,
Wei Tian¹,
Yichi Zhang¹ &
…
Chuzheng Wang¹

397 Accesses
1 Altmetric
Explore all metrics

Abstract

Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning. Recently, Transformer has achieved great success in various medical image segmentation tasks, which has long-range dependence compared to convolution neural network (CNN). However, self-attention in Transformer regards 2D images as 1D sequences, which destroys the crucial 2D structure of the image, which only considers the adaptability in spatial dimension but ignores the adaptability in channel dimension. Transformer can result in limited localization abilities due to insufficient low-level details. To solve these problems, this paper proposes Depth-wise Separable Convolutional Attention U-shape network (DW-SCA Unet), a U-shaped encoding-decoding network for medical image segmentation. The depth-wise separable convolutional attention block absorbs the advantages of depth-wise separable convolution and transformer, including long-range dependence, local structural information and adaptability. Specifically, we use a depth-wise separable convolutional attention block as the encoder to extract contextual features. And a symmetric depth-wise separable convolutional attention block with patch expanding layer is designed to perform up-sampling operation to restore the spatial resolution of the feature maps. On the skip connections of U-shaped networks, we use Channel-wise Cross-Attention (CCA) to guide the fused multi-scale channel-wise information to efficiently connect to decoder features for eliminating the ambiguity. Experimental results showed that DW-SCA Unet achieved better performance in the Synapse multi-organ segmentation task with segmentation accuracies of 80.54%% (DSC) and 20.02% (HD). The experiments on multi-organ and cardiac segmentation tasks also demonstrate the superiority, effectiveness and robustness of our DW-SCA Unet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

D-former: a U-shaped Dilated Transformer for 3D medical image segmentation

Article 06 October 2022

LiteTrans: Reconstruct Transformer with Convolution for Medical Image Segmentation

Data availability

The Synapse datasets that support the findings of this study are available in [repository name “Synapse Storage” at https://doi.org/10.7303/syn3193805[doi], Synapse ID [syn3193805]. The ACDC datasets analysed during the current study are available in the “Automated Cardiac Diagnosis Challenge (ACDC)” repository. The ACDC dataset can be accessed on the Internet (https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html).

References

Alom MZ, Hasan M, Yakopcic C et al (2018) Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation[J]. arXiv preprint arXiv:1802.06955
Cao H, Wang Y, Chen J et al (2023) Swin-unet: Unet-like pure transformer for medical image segmentation[C]//Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, Proceedings, Part III. Springer Nature Switzerland, Cham, pp 205–218
Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Article Google Scholar
Chen J, Lu Y, Yu Q et al (2021) Transunet: transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv:2102.04306
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions[C]//. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251-1258
Çiçek Ö, Abdulkadir A, Lienkamp SS et al (2016) 3D U-Net: learning dense volumetric segmentation from sparse annotation [C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 424-432
Lee J, Toutanova K (2018) Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805
Ding X, Zhang X, Han J et al (2022) Scaling up your kernels to 31x31: Revisiting large kernel design in cnns[C]//. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11963-11975
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929
Gu Z, Cheng J, Fu H et al (2019) Ce-net: context encoder network for 2d medical image segmentation[J]. IEEE Trans Med Imaging 38(10):2281–2292
Article Google Scholar
Guo MH, Lu CZ, Liu Z N et al (2022) Visual attention network[J]. arXiv preprint arXiv:2202.09741
Han Q, Fan Z, Dai Q et al (2021) On the connection between local attention and dynamic depth-wise convolution[C]//. International conference on learning representations
Han K, Wang Y, Chen H et al (2022) A survey on vision transformer[J]. IEEE Trans Pattern Anal Mach Intell 45(1):87–110
Article Google Scholar
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861
Huang H, Lin L, Tong R et al (2020) Unet 3+: a full-scale connected unet for medical image segmentation[C]//. ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1055-1059
Huang Z, Ben Y, Luo G et al (2021) Shuffle transformer: rethinking spatial shuffle for vision transformer[J]. arXiv preprint arXiv:2106.03650
Huang X, Deng Z, Li D et al (2021) Missformer: an effective medical image segmentation transformer[J]. arXiv preprint arXiv:2109.07162
Iandola F, Moskewicz M, Karayev S et al (2014) Densenet: implementing efficient convnet descriptor pyramids[J]. arXiv preprint arXiv:1404.1869
Ibtehaz N, Rahman MS (2020) MultiResUNet: rethinking the U-net architecture for multimodal biomedical image segmentation[J]. Neural Netw 121:74–87
Article Google Scholar
Jha D, Riegler M A, Johansen D et al (2020) Doubleu-net: a deep convolutional neural network for medical image segmentation[C]//. 2020 IEEE 33rd international symposium on computer-based medical systems (CBMS). IEEE, pp 558-564
Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: hierarchical vision transformer using shifted windows[C]//. Proceedings of the IEEE/CVF international conference on computer vision, pp 10012-10022
Ma N, Zhang X, Zheng H T et al (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//. Proceedings of the European conference on computer vision (ECCV) pp 116-131
Milletari F, Navab N, Ahmadi S A (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation[C]//. 2016 fourth international conference on 3D vision (3DV). IEEE, pp 565-571
Oktay O, Schlemper J, Folgoc LL et al (2018) Attention u-net: learning where to look for the pancreas[J]. arXiv preprint arXiv:1804.03999
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation [C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 234-241
Schlemper J, Oktay O, Schaap M et al (2019) Attention gated networks: learning to leverage salient regions in medical images[J]. Med Image Anal 53:197–207
Article Google Scholar
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks[C]//. International conference on machine learning. PMLR, pp 6105-6114
Tay Y, Dehghani M, Gupta J et al (2021) Are pre-trained convolutions better than pre-trained transformers?[J]. arXiv preprint arXiv:2105.03322
Tsai A, Yezzi A, Wells W et al (2003) A shape-based approach to the segmentation of medical imagery using level sets[J]. IEEE Trans Med Imaging 22(2):137–154
Article Google Scholar
Valanarasu J M J, Sindagi V A, Hacihaliloglu I et al (2020) Kiu-net: towards accurate segmentation of biomedical images using over-complete representations[C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 363-373
Valanarasu J M J, Oza P, Hacihaliloglu I et al (2021) Medical transformer: gated axial-attention for medical image segmentation[C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 36-46
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need[J]. Adv Neural Inf Process Syst 30
Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks[C]//. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794-7803
Wang Q, Wu B, Zhu P et al (2020) Supplementary material for ‘ECA-net: efficient channel attention for deep convolutional neural networks[C]//. Proceedings of the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, WA, USA, pp 13-19
Wang W, Xie E, Li X et al (2021) Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//. Proceedings of the IEEE/CVF international conference on computer vision, pp 568-578
Wang H, Cao P, Wang J et al (2022) Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer [C]//. Proceedings of the AAAI conference on artificial intelligence. 36(3):2441-2449
Wang Z, Cun X, Bao J et al (2022) Uformer: A general u-shaped transformer for image restoration[C]//. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17683-17693
Xiao X, Lian S, Luo Z et al (2018) Weighted res-unet for high-quality retina vessel segmentation[C]//. 2018 9th international conference on information technology in medicine and education (ITME). IEEE, pp 327-331
Xie J, Zhu R, Wu Z et al (2022) FFUNet: a novel feature fusion makes strong decoder for medical image segmentation[J]. IET Signal Processing 16(5):501–514
Article Google Scholar
Yuan Y, Fu R, Huang L et al (2021) Hrformer: high-resolution vision transformer for dense predict[J]. Adv Neural Inf Proces Syst 34:7281–7293
Google Scholar
Zhang T, Qi G J, Xiao B et al (2017) Interleaved group convolutions[C]//. Proceedings of the IEEE international conference on computer vision, pp 4373-4382
Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network[C]//. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881-2890
Zhou Z, Rahman Siddiquee M M, Tajbakhsh N et al (2018) Unet++: a nested u-net architecture for medical image segmentation[M]//. Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, Cham, pp 3-11

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61602528).

Author information

Authors and Affiliations

College of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, 410004, China
Yi Zhou, Wei Tian, Yichi Zhang & Chuzheng Wang

Authors

Yi Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Wei Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yichi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chuzheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chuzheng Wang.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, Y., Tian, W., Zhang, Y. et al. DW-SCA Unet: medical image segmentation based on depth-wise separable convolutional attention U-shaped network. Multimed Tools Appl 83, 8893–8910 (2024). https://doi.org/10.1007/s11042-023-15960-3

Download citation

Received: 14 January 2023
Revised: 11 April 2023
Accepted: 29 May 2023
Published: 15 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15960-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DW-SCA Unet: medical image segmentation based on depth-wise separable convolutional attention U-shaped network

Abstract

Access this article

Similar content being viewed by others

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

D-former: a U-shaped Dilated Transformer for 3D medical image segmentation

LiteTrans: Reconstruct Transformer with Convolution for Medical Image Segmentation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DW-SCA Unet: medical image segmentation based on depth-wise separable convolutional attention U-shaped network

Abstract

Access this article

Similar content being viewed by others

Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation

D-former: a U-shaped Dilated Transformer for 3D medical image segmentation

LiteTrans: Reconstruct Transformer with Convolution for Medical Image Segmentation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation