MS-UNet: Swin Transformer U-Net with Multi-scale Nested Decoder for Medical Image Segmentation with Small Training Data

Chen, Haoyuan; Han, Yufei; Li, Yanyi; Xu, Pin; Li, Kuan; Yin, Jianping

doi:10.1007/978-981-99-8558-6_39

Haoyuan Chen¹⁵,
Yufei Han¹⁵,
Yanyi Li¹⁵,
Pin Xu¹⁵,
Kuan Li¹⁵ &
…
Jianping Yin¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14437))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

319 Accesses

Abstract

We propose a novel U-Net model named MS-UNet for the medical image segmentation task in this study. Instead of the single-layer U-Net decoder structure used in Swin-UNet and TransUnet, we specifically design a multi-scale nested decoder based on the Swin Transformer for U-Net. The new framework is proposed based on the observation that the single-layer decoder structure of U-Net is too “thin” to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse if the number of training sets of data is not sufficiently large, which is common in medical image processing tasks where annotated data are more difficult to obtain than other tasks. Overall, the proposed multi-scale nested decoder structure allows the feature mapping between the decoder and encoder to be semantically closer, thus enabling the network to learn more detailed features. Experiment results show that MS-UNet could effectively improve the network performance with more efficient feature learning capability and exhibit more advanced performance, especially in the extreme case with a small amount of training data. The code is publicly available at: https://github.com/HH446/MS-UNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cao, H., et al.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv:2105.05537 (May 2021). http://arxiv.org/abs/2105.05537
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. CoRR abs/ arXiv: 2005.12872 (2020)
Chen, J., et al.: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv:2102.04306 (Feb 2021)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255 (Jun 2009). https://doi.org/10.1109/CVPR.2009.5206848, ISSN: 1063-6919
Dosovitskiy, A., et al.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929 (Jun 2021)
Fu, S., et al.: Domain adaptive relational reasoning for 3d multi-organ segmentation. CoRR abs/ arXiv: 2005.09120 (2020)
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked Autoencoders Are Scalable Vision Learners. arXiv:2111.06377 (Dec 2021)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum Contrast for Unsupervised Visual Representation Learning (Mar 2020). arXiv: 1911.05722
Huang, H., et al.: UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation (Apr 2020). arXiv:2004.08790
Jha, D., Riegler, M.A., Johansen, D., Halvorsen, P., Johansen, H.D.: DoubleU-Net: a deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 558–564. IEEE, Rochester, MN, USA (Jul 2020). https://doi.org/10.1109/CBMS49503.2020.00111,https://ieeexplore.ieee.org/document/9183321/
Kuang, H., Liang, Y., Liu, N., Liu, J., Wang, J.: BEA-SegNet: body and edge aware network for medical image segmentation. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 939–944. IEEE, Houston, TX, USA (Dec 2021). https://doi.org/10.1109/BIBM52615.2021.9669545,https://ieeexplore.ieee.org/document/9669545/
Lin, A., Chen, B., Xu, J., Zhang, Z., Lu, G.: DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation. arXiv:2106.06716 (Jun 2021)
Liu, Z., et al.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv:2103.14030 (Aug 2021)
Oktay, O., et al.: Attention U-Net: Learning Where to Look for the Pancreas, arXiv:1804.03999 (May 2018)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Vaswani, Aet al.: Attention Is All You Need, arXiv:1706.03762 (Dec 2017)
Wang, H., et al.: Mixed Transformer U-Net For Medical Image Segmentation, arXiv:2111.04734 (Nov 2021)
Xiao, X., Lian, S., Luo, Z., Li, S.: Weighted Res-UNet for high-quality retina vessel segmentation. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME), pp. 327–331 (Oct 2018). https://doi.org/10.1109/ITME.2018.00080, ISSN: 2474-3828
Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017). https://doi.org/10.1109/CVPR.2017.660
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested u-net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Dongguan University of Technology, Dongguan, China
Haoyuan Chen, Yufei Han, Yanyi Li, Pin Xu, Kuan Li & Jianping Yin

Authors

Haoyuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yufei Han
View author publications
You can also search for this author in PubMed Google Scholar
Yanyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Pin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Kuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuan Li .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, H., Han, Y., Li, Y., Xu, P., Li, K., Yin, J. (2024). MS-UNet: Swin Transformer U-Net with Multi-scale Nested Decoder for Medical Image Segmentation with Small Training Data. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14437. Springer, Singapore. https://doi.org/10.1007/978-981-99-8558-6_39

Download citation

DOI: https://doi.org/10.1007/978-981-99-8558-6_39
Published: 26 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8557-9
Online ISBN: 978-981-99-8558-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MS-UNet: Swin Transformer U-Net with Multi-scale Nested Decoder for Medical Image Segmentation with Small Training Data