Skip to main content

msFormer: Adaptive Multi-Modality 3D Transformer for Medical Image Segmentation

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13535))

Included in the following conference series:

Abstract

Over the past years, Convolutional Neural Networks (CNNs) have dominated the field of medical image segmentation. But they have difficulty representing long-range dependencies. Recently, the Transformer has been applied to medical image segmentation. Transformer-based architectures that utilize the self-attention (core of the Transformer) mechanism can encode long-range dependencies on images with highly expressive learning capabilities. In this paper, we introduce an adaptive multi-modality 3D medical image segmentation network based on Transformer (called msFormer), which is also a powerful 3D fusion network, and extend the application of Transformer to multi-modality medical image segmentation. This fusion network is modeled in the U-shaped structure to exploit complementary features of different modalities at multiple scales, which increases the cubical representations. We conducted a comprehensive experimental analysis on the Prostate and BraTS2021 datasets. The results show that our method achieves an average DSC of 0.905 and 0.851 on these two datasets, respectively, outperforming existing state-of-the-art methods and providing significant improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://medicaldecathlon.com/.

  2. 2.

    http://braintumorsegmentation.org/.

References

  1. Hastreiter, P., Bischoff, B., Fahlbusch, R., Doerfler, A., et al.: Data fusion and 3D visualization for optimized representation of neurovascular relationships in the posterior fossa. Acta Neurochirurgica 164(8), 1–11 (2022)

    Article  Google Scholar 

  2. Pereira, H.R., Barzegar, M., Hamadelseed, O., Esteve, A.V., et al.: 3D surgical planning of pediatric tumors: a review. Int. J. Comput. Assist. Radiol. Surg. 17, 1–12 (2022). https://doi.org/10.1007/s11548-022-02557-8

    Article  Google Scholar 

  3. Moussallem, M., Valette, P.-J., Traverse-Glehen, A., Houzard, C., et al.: New strategy for automatic tumor segmentation by adaptive thresholding on PET/CT images. J. Appl. Clin. Med. Phys. 13(5), 236–251 (2012)

    Article  Google Scholar 

  4. Liu, Z., Song, Y., Maere, C., Liu, Q., et al.: A method for PET-CT lung cancer segmentation based on improved random walk. In: 24th International Conference on Pattern Recognition (ICPR), PP. 1187–1192 (2018)

    Google Scholar 

  5. Song, Q., Bai, J., Han, D., Bhatia, S., et al.: Optimal co-segmentation of tumor in PET-CT images with context information. IEEE Trans. Med. Imaging 32(9), 1685–1697 (2013)

    Article  Google Scholar 

  6. Zhao, X., Li, L., Lu, W., Tan, S.: Tumor co-segmentation in PET/CT using multi-modality fully convolutional neural network. Phys. Med. Biol. 64(1), 015011 (2018)

    Article  Google Scholar 

  7. Kumar, A., Fulham, M., Feng, D., Kim, J.: Co-learning feature fusion maps from PET-CT images of lung cancer. IEEE Trans. Med. Imaging 39(1), 204–217 (2019)

    Article  Google Scholar 

  8. Xue, Z., Li, P., Zhang, L., Lu, X., et al.: Multi-modal co-learning for liver lesion segmentation on PET-CT images. IEEE Trans. Med. Imaging 40(12), 3531–3542 (2021)

    Article  Google Scholar 

  9. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit J., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  10. Shamshad, F., Khan, S., Zamir S.W., et al.: Transformers in medical imaging: a survey. arXiv preprint arXiv:2201.09873 (2022)

  11. Chen, J., Lu, Y., Yu, Q., Luo X., et al.: TransuNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)

  12. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  13. Hatamizadeh, A., Tang, Y., Nath, V., Yang, D., et al.: UNETR: transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)

    Google Scholar 

  14. Cao, H., Wang, Y., Chen, J., Jiang, D.,et al.: Swin-unet: unet-like pure transformer for medical image segmentation. arXiv preprint arXiv:2105.05537 (2021)

  15. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  16. Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)

  17. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv preprint arXiv:1607.06450 (2016)

  18. Dong, X., Bao, J., Chen, D., Zhang, W., et al.: Cswin transformer: a general vision transformer backbone with cross-shaped windows. arXiv preprint arXiv:2107.00652 (2021)

  19. Liu, Z., Lin, Y., Cao, Y., Hu, H., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)

    Google Scholar 

  20. Simpson, A.L., Antonelli, M., Bakas, S., Bilello, M., et al.: A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 (2019)

  21. Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2014)

    Article  Google Scholar 

  22. Bakas, S., Akbari, H., Sotiras, A., Bilello, M., et al.: Segmentation labels and radiomic features for the pre-operative scans of the TCGA-LGG collection. The cancer imaging archive 286, (2017)

    Google Scholar 

  23. Bakas, S., Akbari, H., Sotiras, A., Bilello, M., et al.: Segmentation labels and radiomic features for the pre-operative scans of the TCGA-GBM collection. Nat. Sci. Data 4, 170117 (2017)

    Article  Google Scholar 

  24. Bakas, S., Akbari, H., Sotiras, A., Bilello, M., et al.: Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data 4(1), 1–13 (2017)

    Article  Google Scholar 

  25. Baid, U., Ghodasara, S., Mohan, S., Bilello, M., et al.: The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv preprint arXiv:2107.02314 (2021)

  26. Xu, L., Tetteh, G., Lipkova, J., Zhao, Y., et al.: Automated whole-body bone lesion detection for multiple myeloma on 68GA-Pentixafor PET/CT imaging using deep learning methods. Contrast Media Mol. Imaging 2018, 2391925 (2018)

    Article  Google Scholar 

  27. Zhang, Y., et al.: Modality-aware mutual learning for multi-modal medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 589–599. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_56

    Chapter  Google Scholar 

  28. Zhou, T., Ruan, S., Guo, Y., Canu, S.: A multi-modality fusion network based on attention mechanism for brain tumor segmentation. In: 2020 IEEE 17th international symposium on biomedical imaging (ISBI), pp. 377–380 (2020)

    Google Scholar 

  29. Zhou, H.-Y., Guo, J., Zhang, Y., Yu, L., et al.: nnFormer: interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201 (2021)

Download references

Acknowledgment

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61901074 and 61902046) and the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant Nos. KJQN201900636 and KJQN201900631) and China Postdoctoral Science Foundation (Grant No. 2021M693771) and Chongqing postgraduates innovation project (CYS21310).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shenhai Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tan, J., Jiang, C., Li, L., Li, H., Li, W., Zheng, S. (2022). msFormer: Adaptive Multi-Modality 3D Transformer for Medical Image Segmentation. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13535. Springer, Cham. https://doi.org/10.1007/978-3-031-18910-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-18910-4_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-18909-8

  • Online ISBN: 978-3-031-18910-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics