Skip to main content
Log in

DW-SCA Unet: medical image segmentation based on depth-wise separable convolutional attention U-shaped network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning. Recently, Transformer has achieved great success in various medical image segmentation tasks, which has long-range dependence compared to convolution neural network (CNN). However, self-attention in Transformer regards 2D images as 1D sequences, which destroys the crucial 2D structure of the image, which only considers the adaptability in spatial dimension but ignores the adaptability in channel dimension. Transformer can result in limited localization abilities due to insufficient low-level details. To solve these problems, this paper proposes Depth-wise Separable Convolutional Attention U-shape network (DW-SCA Unet), a U-shaped encoding-decoding network for medical image segmentation. The depth-wise separable convolutional attention block absorbs the advantages of depth-wise separable convolution and transformer, including long-range dependence, local structural information and adaptability. Specifically, we use a depth-wise separable convolutional attention block as the encoder to extract contextual features. And a symmetric depth-wise separable convolutional attention block with patch expanding layer is designed to perform up-sampling operation to restore the spatial resolution of the feature maps. On the skip connections of U-shaped networks, we use Channel-wise Cross-Attention (CCA) to guide the fused multi-scale channel-wise information to efficiently connect to decoder features for eliminating the ambiguity. Experimental results showed that DW-SCA Unet achieved better performance in the Synapse multi-organ segmentation task with segmentation accuracies of 80.54%% (DSC) and 20.02% (HD). The experiments on multi-organ and cardiac segmentation tasks also demonstrate the superiority, effectiveness and robustness of our DW-SCA Unet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The Synapse datasets that support the findings of this study are available in [repository name “Synapse Storage” at https://doi.org/10.7303/syn3193805[doi], Synapse ID [syn3193805]. The ACDC datasets analysed during the current study are available in the “Automated Cardiac Diagnosis Challenge (ACDC)” repository. The ACDC dataset can be accessed on the Internet (https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html).

References

  1. Alom MZ, Hasan M, Yakopcic C et al (2018) Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation[J]. arXiv preprint arXiv:1802.06955

  2. Cao H, Wang Y, Chen J et al (2023) Swin-unet: Unet-like pure transformer for medical image segmentation[C]//Computer Vision–ECCV 2022 Workshops: Tel Aviv, Israel, Proceedings, Part III. Springer Nature Switzerland, Cham, pp 205–218

  3. Chen LC, Papandreou G, Kokkinos I et al (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  4. Chen J, Lu Y, Yu Q et al (2021) Transunet: transformers make strong encoders for medical image segmentation[J]. arXiv preprint arXiv:2102.04306

  5. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions[C]//. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251-1258

  6. Çiçek Ö, Abdulkadir A, Lienkamp SS et al (2016) 3D U-Net: learning dense volumetric segmentation from sparse annotation [C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 424-432

  7. Lee J, Toutanova K (2018) Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805

  8. Ding X, Zhang X, Han J et al (2022) Scaling up your kernels to 31x31: Revisiting large kernel design in cnns[C]//. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11963-11975

  9. Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929

  10. Gu Z, Cheng J, Fu H et al (2019) Ce-net: context encoder network for 2d medical image segmentation[J]. IEEE Trans Med Imaging 38(10):2281–2292

    Article  Google Scholar 

  11. Guo MH, Lu CZ, Liu Z N et al (2022) Visual attention network[J]. arXiv preprint arXiv:2202.09741

  12. Han Q, Fan Z, Dai Q et al (2021) On the connection between local attention and dynamic depth-wise convolution[C]//. International conference on learning representations

  13. Han K, Wang Y, Chen H et al (2022) A survey on vision transformer[J]. IEEE Trans Pattern Anal Mach Intell 45(1):87–110

    Article  Google Scholar 

  14. Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861

  15. Huang H, Lin L, Tong R et al (2020) Unet 3+: a full-scale connected unet for medical image segmentation[C]//. ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1055-1059

  16. Huang Z, Ben Y, Luo G et al (2021) Shuffle transformer: rethinking spatial shuffle for vision transformer[J]. arXiv preprint arXiv:2106.03650

  17. Huang X, Deng Z, Li D et al (2021) Missformer: an effective medical image segmentation transformer[J]. arXiv preprint arXiv:2109.07162

  18. Iandola F, Moskewicz M, Karayev S et al (2014) Densenet: implementing efficient convnet descriptor pyramids[J]. arXiv preprint arXiv:1404.1869

  19. Ibtehaz N, Rahman MS (2020) MultiResUNet: rethinking the U-net architecture for multimodal biomedical image segmentation[J]. Neural Netw 121:74–87

    Article  Google Scholar 

  20. Jha D, Riegler M A, Johansen D et al (2020) Doubleu-net: a deep convolutional neural network for medical image segmentation[C]//. 2020 IEEE 33rd international symposium on computer-based medical systems (CBMS). IEEE, pp 558-564

  21. Liu Z, Lin Y, Cao Y et al (2021) Swin transformer: hierarchical vision transformer using shifted windows[C]//. Proceedings of the IEEE/CVF international conference on computer vision, pp 10012-10022

  22. Ma N, Zhang X, Zheng H T et al (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//. Proceedings of the European conference on computer vision (ECCV) pp 116-131

  23. Milletari F, Navab N, Ahmadi S A (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation[C]//. 2016 fourth international conference on 3D vision (3DV). IEEE, pp 565-571

  24. Oktay O, Schlemper J, Folgoc LL et al (2018) Attention u-net: learning where to look for the pancreas[J]. arXiv preprint arXiv:1804.03999

  25. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation [C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 234-241

  26. Schlemper J, Oktay O, Schaap M et al (2019) Attention gated networks: learning to leverage salient regions in medical images[J]. Med Image Anal 53:197–207

    Article  Google Scholar 

  27. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks[C]//. International conference on machine learning. PMLR, pp 6105-6114

  28. Tay Y, Dehghani M, Gupta J et al (2021) Are pre-trained convolutions better than pre-trained transformers?[J]. arXiv preprint arXiv:2105.03322

  29. Tsai A, Yezzi A, Wells W et al (2003) A shape-based approach to the segmentation of medical imagery using level sets[J]. IEEE Trans Med Imaging 22(2):137–154

    Article  Google Scholar 

  30. Valanarasu J M J, Sindagi V A, Hacihaliloglu I et al (2020) Kiu-net: towards accurate segmentation of biomedical images using over-complete representations[C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 363-373

  31. Valanarasu J M J, Oza P, Hacihaliloglu I et al (2021) Medical transformer: gated axial-attention for medical image segmentation[C]//. International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 36-46

  32. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need[J]. Adv Neural Inf Process Syst 30

  33. Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks[C]//. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794-7803

  34. Wang Q, Wu B, Zhu P et al (2020) Supplementary material for ‘ECA-net: efficient channel attention for deep convolutional neural networks[C]//. Proceedings of the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, WA, USA, pp 13-19

  35. Wang W, Xie E, Li X et al (2021) Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//. Proceedings of the IEEE/CVF international conference on computer vision, pp 568-578

  36. Wang H, Cao P, Wang J et al (2022) Uctransnet: rethinking the skip connections in u-net from a channel-wise perspective with transformer [C]//. Proceedings of the AAAI conference on artificial intelligence. 36(3):2441-2449

  37. Wang Z, Cun X, Bao J et al (2022) Uformer: A general u-shaped transformer for image restoration[C]//. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 17683-17693

  38. Xiao X, Lian S, Luo Z et al (2018) Weighted res-unet for high-quality retina vessel segmentation[C]//. 2018 9th international conference on information technology in medicine and education (ITME). IEEE, pp 327-331

  39. Xie J, Zhu R, Wu Z et al (2022) FFUNet: a novel feature fusion makes strong decoder for medical image segmentation[J]. IET Signal Processing 16(5):501–514

    Article  Google Scholar 

  40. Yuan Y, Fu R, Huang L et al (2021) Hrformer: high-resolution vision transformer for dense predict[J]. Adv Neural Inf Proces Syst 34:7281–7293

    Google Scholar 

  41. Zhang T, Qi G J, Xiao B et al (2017) Interleaved group convolutions[C]//. Proceedings of the IEEE international conference on computer vision, pp 4373-4382

  42. Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network[C]//. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881-2890

  43. Zhou Z, Rahman Siddiquee M M, Tajbakhsh N et al (2018) Unet++: a nested u-net architecture for medical image segmentation[M]//. Deep learning in medical image analysis and multimodal learning for clinical decision support. Springer, Cham, pp 3-11

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61602528).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chuzheng Wang.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Tian, W., Zhang, Y. et al. DW-SCA Unet: medical image segmentation based on depth-wise separable convolutional attention U-shaped network. Multimed Tools Appl 83, 8893–8910 (2024). https://doi.org/10.1007/s11042-023-15960-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15960-3

Keywords

Navigation