Skip to main content

MS-UNet: Swin Transformer U-Net with Multi-scale Nested Decoder for Medical Image Segmentation with Small Training Data

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14437))

Included in the following conference series:

  • 319 Accesses

Abstract

We propose a novel U-Net model named MS-UNet for the medical image segmentation task in this study. Instead of the single-layer U-Net decoder structure used in Swin-UNet and TransUnet, we specifically design a multi-scale nested decoder based on the Swin Transformer for U-Net. The new framework is proposed based on the observation that the single-layer decoder structure of U-Net is too “thin” to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse if the number of training sets of data is not sufficiently large, which is common in medical image processing tasks where annotated data are more difficult to obtain than other tasks. Overall, the proposed multi-scale nested decoder structure allows the feature mapping between the decoder and encoder to be semantically closer, thus enabling the network to learn more detailed features. Experiment results show that MS-UNet could effectively improve the network performance with more efficient feature learning capability and exhibit more advanced performance, especially in the extreme case with a small amount of training data. The code is publicly available at: https://github.com/HH446/MS-UNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cao, H., et al.: Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv:2105.05537 (May 2021). http://arxiv.org/abs/2105.05537

  2. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. CoRR abs/ arXiv: 2005.12872 (2020)

  3. Chen, J., et al.: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv:2102.04306 (Feb 2021)

  4. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255 (Jun 2009). https://doi.org/10.1109/CVPR.2009.5206848, ISSN: 1063-6919

  5. Dosovitskiy, A., et al.: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929 (Jun 2021)

  6. Fu, S., et al.: Domain adaptive relational reasoning for 3d multi-organ segmentation. CoRR abs/ arXiv: 2005.09120 (2020)

  7. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked Autoencoders Are Scalable Vision Learners. arXiv:2111.06377 (Dec 2021)

  8. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum Contrast for Unsupervised Visual Representation Learning (Mar 2020). arXiv: 1911.05722

  9. Huang, H., et al.: UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation (Apr 2020). arXiv:2004.08790

  10. Jha, D., Riegler, M.A., Johansen, D., Halvorsen, P., Johansen, H.D.: DoubleU-Net: a deep convolutional neural network for medical image segmentation. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 558–564. IEEE, Rochester, MN, USA (Jul 2020). https://doi.org/10.1109/CBMS49503.2020.00111,https://ieeexplore.ieee.org/document/9183321/

  11. Kuang, H., Liang, Y., Liu, N., Liu, J., Wang, J.: BEA-SegNet: body and edge aware network for medical image segmentation. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 939–944. IEEE, Houston, TX, USA (Dec 2021). https://doi.org/10.1109/BIBM52615.2021.9669545,https://ieeexplore.ieee.org/document/9669545/

  12. Lin, A., Chen, B., Xu, J., Zhang, Z., Lu, G.: DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation. arXiv:2106.06716 (Jun 2021)

  13. Liu, Z., et al.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv:2103.14030 (Aug 2021)

  14. Oktay, O., et al.: Attention U-Net: Learning Where to Look for the Pancreas, arXiv:1804.03999 (May 2018)

  15. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  16. Vaswani, Aet al.: Attention Is All You Need, arXiv:1706.03762 (Dec 2017)

  17. Wang, H., et al.: Mixed Transformer U-Net For Medical Image Segmentation, arXiv:2111.04734 (Nov 2021)

  18. Xiao, X., Lian, S., Luo, Z., Li, S.: Weighted Res-UNet for high-quality retina vessel segmentation. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME), pp. 327–331 (Oct 2018). https://doi.org/10.1109/ITME.2018.00080, ISSN: 2474-3828

  19. Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)

    Google Scholar 

  20. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230–6239 (2017). https://doi.org/10.1109/CVPR.2017.660

  21. Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested u-net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuan Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, H., Han, Y., Li, Y., Xu, P., Li, K., Yin, J. (2024). MS-UNet: Swin Transformer U-Net with Multi-scale Nested Decoder for Medical Image Segmentation with Small Training Data. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14437. Springer, Singapore. https://doi.org/10.1007/978-981-99-8558-6_39

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8558-6_39

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8557-9

  • Online ISBN: 978-981-99-8558-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics