Skip to main content
Log in

SwinE-UNet3+: swin transformer encoder network for medical image segmentation

  • Short Communication
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

A SwinE-UNet3+ model is proposed to improve the problem that convolutional neural networks cannot capture long-range feature dependencies due to the limitation of receptive field and is insensitive to contour details in tumor segmentation tasks. Each encoder layer of SwinE-UNet3+ uses two consecutive Swin Transformer blocks to extract features, especially long-range features in images. Patch Merging is used for down-sampling between encoder layers. The decoder uses Conv2DTranspose to perform progressive up-sampling and uses convolution operation to aggregate the decoder information after up-sampling and the encoder information through skip connection. The proposed model evaluates the TipDM Cup rectal cancer dataset and the melanoma dermoscopic image ISIC-2017 dataset. Experimental results show that SwinE-UNet3+ model outperforms UNet, UNet++ and UNet3+ models in Dice coefficient, IOU value and Precision evaluation metric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2015). https://doi.org/10.1109/CVPR.2015.7298965

    Article  Google Scholar 

  2. Milletari, F., Navab, N., Ahmadi, S. A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016). https://doi.org/10.1109/3DV.2016.79

  3. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_28

  4. Ibtehaz, N., Rahman, M.S.: MultiResUNet: rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 121, 74–87 (2020). https://doi.org/10.1016/j.neunet.2019.08.025

    Article  Google Scholar 

  5. Li, X., Hao, C., Qi, X., Qi, D., Fu, C.W., Pheng-Ann, H.: H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (2018)

    Article  Google Scholar 

  6. Jha, D., Smedsrud, P.H., Riegler, M.A., Johansen, D., Simulamet.: Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), pp. 225–2255 (2019). https://doi.org/10.1109/ISM46123.2019.00049

  7. Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11 (2018)

  8. Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., Mcdonagh, S., Hammerla, N.Y., Kainz, B.: Attention U-Net: learning where to look for the pancreas. MIDL. In: Proc, pp. 1–10 (2018). https://doi.org/10.48550/arXiv.1804.03999

  9. Gu, Z., Cheng, J., Fu, H., Zhou, K., Hao, H., Zhao, Y., Zhang, T., Gao, S., Liu, J.: Ce-net: context encoder network for 2d medical image segmentation. IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019). https://doi.org/10.1109/TMI.2019.2903562

    Article  Google Scholar 

  10. Vaswani, A., Shazeer, N., Parmar, N.: Attention is all you need. Adv. Neural Inf. Process. Syst. 5998–6008 (2017)

  11. Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Tay, FE., Feng, J., Yan, S.: Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558–567 (2021). https://doi.org/10.48550/arXiv.2101.11986

  12. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformer. In: European Conference on Computer Vision, pp. 213–229 (2020). https://doi.org/10.1007/978-3-030-58452-8_13

  13. Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 36–46 (2021). https://doi.org/10.48550/arXiv.2102.10662

  14. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022 (2021). https://doi.org/10.48550/arXiv.2103.14030

  15. Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation. Arxiv Prep. (2021). https://doi.org/10.48550/arXiv.2105.05537

    Article  Google Scholar 

  16. Huang, H., Lin, L., Tong, R., Hu, H., Wu, J.: Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059 (2020). https://doi.org/10.1109/ICASSP40776.2020.9053405

  17. Zeiler, M.D., Taylor, G.W., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 International Conference on Computer Vision, pp. 2018–2025 (2011). https://doi.org/10.1109/iccv.2011.6126474

  18. Jamieson, A.R., Drukker, K., Giger, M.L., Van Ginneken, B., Novak, C.L.: Breast image feature learning with adaptive deconvolutional networks. Proc. SPIE Int. Soc. Opt. Eng. 8315, 64–76 (2012). https://doi.org/10.1117/12.910710

    Article  Google Scholar 

  19. Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085 (2020). https://doi.org/10.1109/CVPR42600.2020.01009

  20. Petit, O., Thome, N., Rambour, C., Soler, L.: U-net transformer: self and cross attention for medical image segmentation. In: International Workshop on Machine Learning in Medical Imaging, pp. 267–276 (2021). https://doi.org/10.48550/arXiv.2103.06104

  21. Tang, Z., Jiang, W., Zhang, Z., Zhao, M., Zhang, L.: DenseNet with Up-sampling block for recognizing texts in images. Neural Comput. Appl. 32(11), 1–9 (2020). https://doi.org/10.1007/s00521-019-04285-8

    Article  Google Scholar 

  22. Bipat, S., Glas, A.S., Slors, F.J.M., Zwinderman, A.H., Bossuyt, P.M.M., Stoker, J.: Rectal cancer: local staging and assessment of lymph node involvement with endoluminal US, CT, and MR imaging—a meta-analysis. Radiology 232(3), 773–783 (2004). https://doi.org/10.1148/radiol.2323031368

    Article  Google Scholar 

  23. Codella, N.C., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H.: Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, pp. 168–172 (2018). https://doi.org/10.48550/arXiv.1710.05006

  24. Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1944). https://doi.org/10.2307/1932409

    Article  Google Scholar 

  25. Kubota, T., Jerebko, A.K., Dewan, M., Salganicoff, M., Krishnan, A.: Segmentation of pulmonary nodules of various densities with morphological approaches and convexity models. Med. Image Anal. 15(1), 133–154 (2011). https://doi.org/10.1016/j.media.2010.08.005

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian-Sheng Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, P., Wu, JS. SwinE-UNet3+: swin transformer encoder network for medical image segmentation. Prog Artif Intell 12, 99–105 (2023). https://doi.org/10.1007/s13748-023-00300-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-023-00300-1

Keywords

Navigation