A More Design-Flexible Medical Transformer for Volumetric Image Segmentation

You, Xin; Gu, Yun; He, Junjun; Sun, Hui; Yang, Jie

doi:10.1007/978-3-031-21014-3_7

A More Design-Flexible Medical Transformer for Volumetric Image Segmentation

Xin You^12,13,
Yun Gu^12,13,
Junjun He¹⁴,
Hui Sun¹⁴ &
…
Jie Yang^12,13

Conference paper
First Online: 16 December 2022

1245 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13583))

Abstract

UNet-based encoder-decoder networks dominate volumetric medical image segmentation in the past several years. Many improvements focus on the design of encoders, decoders and skip connections. Due to the intrinsic property of convolutional kernels, convolution-based encoders suffer from limited receptive fields. To deal with that, recently proposed Transformer-based networks leveraging the self-attention mechanism build long-range dependency. However, they are highly reliable on pretrained weights from natural images. In our work, we find out ViT-based (Vision Transformer) models’ performance will not decrease significantly without pretrained weights even if there is a limited data source. So we flexibly design a 3D medical Transformer for image segmentation and train it from scratch. Specifically, we introduce Multi-Scale Dynamic Positional Embeddings to ViT to dynamically acquire positional information of each 3D patch. Positional bias can also enrich attention diversities. Moreover, we give detailed reasons why we choose the convolution-based decoder instead of recently proposed Swin Transformer blocks after preliminary experiments on the decoder design. Finally, we propose the Context Enhancement Module to refine skipped features by merging low and high-frequency information via a combination of convolutional kernels and self-attention modules. Experiments show that our model is comparable to nnUNet on segmentation performance of Medical Segmentation Decathlon (Liver) and VerSe’20 datasets when trained from scratch.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Antonelli, M., et al.: The medical segmentation decathlon. arXiv preprint arXiv:2106.05735 (2021)
Cao, H., et al.: Swin-Unet: Unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)
Chen, J.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Google Scholar
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Dosovitskiy, A., et al.: An image is worth 16 \(\times \) 16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 61–71. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_6
Chapter Google Scholar
Huang, Z., Wang, X., Huang, L., Huang, C., Wei, Y., Liu, W.: CCNet: criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 603–612 (2019)
Google Scholar
Isensee, F., Jaeger, P.F., Kohl, S.A.A., Petersen, J., Maier-Hein, K.H.: nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods 18(2), 203–211 (2021)
Google Scholar
Ji, Y., et al.: Multi-compound transformer for accurate biomedical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 326–336. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_31
Chapter Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Oktay, O., et al.: Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
Park, N., Kim, S.: How do vision transformers work? In: International Conference on Learning Representations (2021)
Google Scholar
Peiris, H., Hayat, M., Chen, Z., Egan, G., Harandi, M.: A volumetric transformer for accurate 3D tumor segmentation. arXiv preprint arXiv:2111.13300 (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sekuboyina, A., et al.: Verse: a vertebrae labelling and segmentation benchmark. arXiv. org e-Print archive (2020)
Tang, Y., et al.: Self-supervised pre-training of swin transformers for 3D medical image analysis. arXiv preprint arXiv:2111.14791 (2021)
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical transformer: gated axial-attention for medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 36–46. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_4
Chapter Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wu, Y., et al.: D-former: a U-shaped dilated transformer for 3D medical image segmentation. arXiv preprint arXiv:2201.00462 (2022)
Xiang, T., Zhang, C., Liu, D., Song, Y., Huang, H., Cai, W.: BiO-Net: learning recurrent bi-directional connections for encoder-decoder architecture. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 74–84. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_8
Chapter Google Scholar
Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 432–448. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_26
Chapter Google Scholar
Xu, Y., Zhang, Q., Zhang, J., Tao, D.: ViTAE: vision transformer advanced by exploring intrinsic inductive bias. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Zhou, H.-Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y.: nnFormer: interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201 (2021)
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: UNet++: a nested U-Net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) DLMIA/ML-CDS -2018. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
Xin You, Yun Gu & Jie Yang
Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China
Xin You, Yun Gu & Jie Yang
SenseTime Research, Beijing, China
Junjun He & Hui Sun

Authors

Xin You
View author publications
You can also search for this author in PubMed Google Scholar
Yun Gu
View author publications
You can also search for this author in PubMed Google Scholar
Junjun He
View author publications
You can also search for this author in PubMed Google Scholar
Hui Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yun Gu or Jie Yang .

Editor information

Editors and Affiliations

Xi'an Jiaotong University, Xi'an, China
Chunfeng Lian
Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Xiaohuan Cao
Istanbul Technical University, Istanbul, Turkey
Islem Rekik
Rensselaer Polytechnic Institute, Troy, NY, USA
Xuanang Xu
ShanghaiTech University, Pudong, China
Zhiming Cui

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5636 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

You, X., Gu, Y., He, J., Sun, H., Yang, J. (2022). A More Design-Flexible Medical Transformer for Volumetric Image Segmentation. In: Lian, C., Cao, X., Rekik, I., Xu, X., Cui, Z. (eds) Machine Learning in Medical Imaging. MLMI 2022. Lecture Notes in Computer Science, vol 13583. Springer, Cham. https://doi.org/10.1007/978-3-031-21014-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-21014-3_7
Published: 16 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21013-6
Online ISBN: 978-3-031-21014-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics