Abstract
A SwinE-UNet3+ model is proposed to improve the problem that convolutional neural networks cannot capture long-range feature dependencies due to the limitation of receptive field and is insensitive to contour details in tumor segmentation tasks. Each encoder layer of SwinE-UNet3+ uses two consecutive Swin Transformer blocks to extract features, especially long-range features in images. Patch Merging is used for down-sampling between encoder layers. The decoder uses Conv2DTranspose to perform progressive up-sampling and uses convolution operation to aggregate the decoder information after up-sampling and the encoder information through skip connection. The proposed model evaluates the TipDM Cup rectal cancer dataset and the melanoma dermoscopic image ISIC-2017 dataset. Experimental results show that SwinE-UNet3+ model outperforms UNet, UNet++ and UNet3+ models in Dice coefficient, IOU value and Precision evaluation metric.
References
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2015). https://doi.org/10.1109/CVPR.2015.7298965
Milletari, F., Navab, N., Ahmadi, S. A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016). https://doi.org/10.1109/3DV.2016.79
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 234–241 (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Ibtehaz, N., Rahman, M.S.: MultiResUNet: rethinking the U-Net architecture for multimodal biomedical image segmentation. Neural Netw. 121, 74–87 (2020). https://doi.org/10.1016/j.neunet.2019.08.025
Li, X., Hao, C., Qi, X., Qi, D., Fu, C.W., Pheng-Ann, H.: H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (2018)
Jha, D., Smedsrud, P.H., Riegler, M.A., Johansen, D., Simulamet.: Resunet++: an advanced architecture for medical image segmentation. In: 2019 IEEE International Symposium on Multimedia (ISM), pp. 225–2255 (2019). https://doi.org/10.1109/ISM46123.2019.00049
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: Unet++: a nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 3–11 (2018)
Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., Mcdonagh, S., Hammerla, N.Y., Kainz, B.: Attention U-Net: learning where to look for the pancreas. MIDL. In: Proc, pp. 1–10 (2018). https://doi.org/10.48550/arXiv.1804.03999
Gu, Z., Cheng, J., Fu, H., Zhou, K., Hao, H., Zhao, Y., Zhang, T., Gao, S., Liu, J.: Ce-net: context encoder network for 2d medical image segmentation. IEEE Trans. Med. Imaging 38(10), 2281–2292 (2019). https://doi.org/10.1109/TMI.2019.2903562
Vaswani, A., Shazeer, N., Parmar, N.: Attention is all you need. Adv. Neural Inf. Process. Syst. 5998–6008 (2017)
Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Tay, FE., Feng, J., Yan, S.: Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 558–567 (2021). https://doi.org/10.48550/arXiv.2101.11986
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformer. In: European Conference on Computer Vision, pp. 213–229 (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: International conference on medical image computing and computer-assisted intervention, pp. 36–46 (2021). https://doi.org/10.48550/arXiv.2102.10662
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022 (2021). https://doi.org/10.48550/arXiv.2103.14030
Cao, H., Wang, Y., Chen, J., Jiang, D., Zhang, X., Tian, Q., Wang, M.: Swin-unet: Unet-like pure transformer for medical image segmentation. Arxiv Prep. (2021). https://doi.org/10.48550/arXiv.2105.05537
Huang, H., Lin, L., Tong, R., Hu, H., Wu, J.: Unet 3+: a full-scale connected unet for medical image segmentation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059 (2020). https://doi.org/10.1109/ICASSP40776.2020.9053405
Zeiler, M.D., Taylor, G.W., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 International Conference on Computer Vision, pp. 2018–2025 (2011). https://doi.org/10.1109/iccv.2011.6126474
Jamieson, A.R., Drukker, K., Giger, M.L., Van Ginneken, B., Novak, C.L.: Breast image feature learning with adaptive deconvolutional networks. Proc. SPIE Int. Soc. Opt. Eng. 8315, 64–76 (2012). https://doi.org/10.1117/12.910710
Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085 (2020). https://doi.org/10.1109/CVPR42600.2020.01009
Petit, O., Thome, N., Rambour, C., Soler, L.: U-net transformer: self and cross attention for medical image segmentation. In: International Workshop on Machine Learning in Medical Imaging, pp. 267–276 (2021). https://doi.org/10.48550/arXiv.2103.06104
Tang, Z., Jiang, W., Zhang, Z., Zhao, M., Zhang, L.: DenseNet with Up-sampling block for recognizing texts in images. Neural Comput. Appl. 32(11), 1–9 (2020). https://doi.org/10.1007/s00521-019-04285-8
Bipat, S., Glas, A.S., Slors, F.J.M., Zwinderman, A.H., Bossuyt, P.M.M., Stoker, J.: Rectal cancer: local staging and assessment of lymph node involvement with endoluminal US, CT, and MR imaging—a meta-analysis. Radiology 232(3), 773–783 (2004). https://doi.org/10.1148/radiol.2323031368
Codella, N.C., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris, K., Mishra, N., Kittler, H.: Skin lesion analysis toward melanoma detection: a challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, pp. 168–172 (2018). https://doi.org/10.48550/arXiv.1710.05006
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1944). https://doi.org/10.2307/1932409
Kubota, T., Jerebko, A.K., Dewan, M., Salganicoff, M., Krishnan, A.: Segmentation of pulmonary nodules of various densities with morphological approaches and convexity models. Med. Image Anal. 15(1), 133–154 (2011). https://doi.org/10.1016/j.media.2010.08.005
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zou, P., Wu, JS. SwinE-UNet3+: swin transformer encoder network for medical image segmentation. Prog Artif Intell 12, 99–105 (2023). https://doi.org/10.1007/s13748-023-00300-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13748-023-00300-1