DBTrans: A Dual-Branch Vision Transformer for Multi-Modal Brain Tumor Segmentation

Zeng, Xinyi; Zeng, Pinxian; Tang, Cheng; Wang, Peng; Yan, Binyu; Wang, Yan

doi:10.1007/978-3-031-43901-8_48

Xinyi Zeng¹⁴,
Pinxian Zeng¹⁴,
Cheng Tang¹⁴,
Peng Wang¹⁴,
Binyu Yan¹⁴ &
…
Yan Wang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14223))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

4565 Accesses

Abstract

3D Spatially Aligned Multi-modal MRI Brain Tumor Segmentation (SAMM-BTS) is a crucial task for clinical diagnosis. While Transformer-based models have shown outstanding success in this field due to their ability to model global features using the self-attention mechanism, they still face two challenges. First, due to the high computational complexity and deficiencies in modeling local features, the traditional self-attention mechanism is ill-suited for SAMM-BTS tasks that require modeling both global and local volumetric features within an acceptable computation overhead. Second, existing models only stack spatially aligned multi-modal data on the channel dimension, without any processing for such multi-channel data in the model’s internal design. To address these challenges, we propose a Transformer-based model for the SAMM-BTS task, namely DBTrans, with dual-branch architectures for both the encoder and decoder. Specifically, the encoder implements two parallel feature extraction branches, including a local branch based on Shifted Window Self-attention and a global branch based on Shuffle Window Cross-attention to capture both local and global information with linear computational complexity. Besides, we add an extra global branch based on Shifted Window Cross-attention to the decoder, introducing the key and value matrices from the corresponding encoder block, allowing the segmented target to access a more complete context during up-sampling. Furthermore, the above dual-branch designs in the encoder and decoder are both integrated with improved channel attention mechanisms to fully explore the contribution of features at different channels. Experimental results demonstrate the superiority of our DBTrans model in both qualitative and quantitative measures. Codes will be released at https://github.com/Aru321/DBTrans.

X. Zeng and P. Zeng—Contribute equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gordillo, N., Montseny, E., et al.: State of the art survey on MRI brain tumor segmentation. Magn. Reson. Imaging 31(8), 1426–1438 (2013)
Article Google Scholar
Luo, Y., Zhou, L., Zhan, B., et al.: Adaptive rectification based adversarial network with spectrum constraint for high-quality PET image synthesis. Med. Image Anal. 77, 102335 (2022)
Article Google Scholar
Wang, K., Zhan, B., Zu, C., Wu, X., et al.: Semi-supervised medical image segmentation via a tripled-uncertainty guided mean teacher model with contrastive learning. Med. Image Anal. 79, 102447 (2022)
Article Google Scholar
Ma, Q., Zu, C., Wu, X., Zhou, J., Wang, Y.: Coarse-To-fine segmentation of organs at risk in nasopharyngeal carcinoma radiotherapy. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 358–368. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_34
Chapter Google Scholar
Tang, P., Yang, P., Nie, D., et al.: Unified medical image segmentation by learning from uncertainty in an end-to-end manner. Knowl.-Based Syst. 241, 108215 (2022)
Article Google Scholar
Zhang, J., Zhang, Z., Wang, L., et al.: Kernel-based feature aggregation framework in point cloud networks. Pattern Recogn. 1(1), 1–15 (2023)
Article Google Scholar
Shi, Y., Zu, C., Yang, P., et al.: Uncertainty-weighted and relation-driven consistency training for semi-supervised head-and-neck tumor segmentation. Knowl.-Based Syst. 272, 110598 (2023)
Article Google Scholar
Wang, K., Wang, Y., Zhan, B., et al.: An efficient semi-supervised framework with multi-task and curriculum learning for medical image. Int. J. Neural Syst. 32(09), 2250043 (2022)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Zhou, T., Zhou, Y., He, K., et al.: Cross-level feature aggregation network for polyp segmentation. Pattern Recogn. 140, 109555 (2023)
Article Google Scholar
Du, G., Cao, X., Liang, J., et al.: Medical image segmentation based on u-net: a review. J. Imaging Sci. Technol. 64(2), 020508-1–020508-12 (2020)
Google Scholar
Brügger, R., Baumgartner, C.F., Konukoglu, E.: A partially reversible U-Net for memory-efficient volumetric image segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11766, pp. 429–437. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32248-9_48
Chapter Google Scholar
Pei, L., Liu, Y.: Multimodal brain tumor segmentation using a 3D ResUNet in BraTS 2021. In: 7th International Workshop, BrainLes 2021, Held in Conjunction with MICCAI 2021, Virtual Event, Revised Selected Papers, Part I, pp. 315–323. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08999-2_26
Zeng, P., Wang, Y., et al.: 3D CVT-GAN: a 3D convolutional vision transformer-GAN for PET reconstruction.In: Wang, L., et al. (eds.) MICCAI 2022, Proceedings, pp. 516-526. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16446-0_49
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Parmar, N., Vaswani, A,, Uszkoreit, J., et al.: Image transformer. In: International Conference on Machine Learning, pp. 4055–4064. PMLR (2018)
Google Scholar
Luo, Y., et al.: 3D Transformer-GAN for high-quality PET reconstruction. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12906, pp. 276–285. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87231-1_27
Chapter Google Scholar
Wen, L., Xiao, J., Tan, S., et al.: A transformer-embedded multi-task model for dose distribution prediction. Int. J. Neural Syst. 33, 2350043–2350043 (2023)
Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: Proceedings of International Conference on Learning Representations (2021)
Google Scholar
Zheng, S., Lu, J., Zhao, H., et al.: Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6881–6890 (2021)
Google Scholar
Xie, E., Wang, W., Yu, Z., et al.: SegFormer: simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077–12090 (2021)
Google Scholar
Hatamizadeh, A., Tang, Y., Nath, V., et al.: Unetr: transformers for 3d medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
Google Scholar
Zhou, H.Y., Guo, J., Zhang, Y., et al.: nnformer: Interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201 (2021)
Lin, A., Chen, B., Xu, J., et al.: Ds-transunet: dual swin transformer u-net for medical image segmentation. IEEE Trans. Instrum. Meas. 71, 1–15 (2022)
Google Scholar
Wang, W., Chen, C., Ding, M., Yu, H., Zha, S., Li, J.: Transbts: multimodal brain tumor segmentation using transformer. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12901, pp. 109–119. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_11
Chapter Google Scholar
Peiris, H., Hayat, M., Chen, Z., et al.: A robust volumetric transformer for accurate 3d tumor segmentation. In: Wang, L., et al. (eds.) MICCAI 2022, Proceedings, Part V, pp. 162–172. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16443-9_16
Chapter Google Scholar
Cao, H., Wang, Y., Chen, J., et al.: Swin-unet: unet-like pure transformer for medical image segmentation. In: Computer Vision–ECCV 2022 Workshops. Proceedings, Part III, pp. 205–218. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-25066-8_9
Liu, Z., Ning, J., Cao, Y., et al.: Video swin transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3202–3211 (2022)
Google Scholar
Gholami, A., et al.: A novel domain adaptation framework for medical image segmentation. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) BrainLes 2018. LNCS, vol. 11384, pp. 289–298. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11726-9_26
Chapter Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Chapter Google Scholar
Jadon S.: A survey of loss functions for semantic segmentation. In: Proceedings of IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–7. IEEE (2020)
Google Scholar
Baid, U., et al.: The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv:2107.02314 (2021)
Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., et al.: Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features. Nat. Sci. Data 4, 170117 (2017)
Article Google Scholar
Zhang, X., Zhou, X., Lin, M, et al.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Google Scholar
Xing, Z., Yu, L., Wan, L., et al.: NestedFormer: nested modality-aware transformer for brain tumor segmentation. In: Wang, L., et al. (eds.) MICCAI 2022, Proceedings, Part V, pp. 140–150. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16443-9_14
Chapter Google Scholar
Wang, Y., Zhou, L., Yu, B., Wang, L., et al.: 3D auto-context-based locality adaptive multi-modality GANs for PET synthesis. IEEE Trans. Med. Imaging 38(6), 1328–1339 (2019)
Article Google Scholar
Menze, B.H., Jakab, A., Bauer, S., et al.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10), 1993–2024 (2014)
Article Google Scholar

Download references

Acknowledgement

This work is supported by the National Natural Science Foundation of China (NSFC 62371325, 62071314), Sichuan Science and Technology Program 2023YFG0263, 2023YFG0025, 2023NSFSC0497, and Opening Foundation of Agile and Intelligent Computing Key Laboratory of Sichuan Province.

Author information

Authors and Affiliations

School of Computer Science, Sichuan University, Chengdu, China
Xinyi Zeng, Pinxian Zeng, Cheng Tang, Peng Wang, Binyu Yan & Yan Wang

Authors

Xinyi Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Pinxian Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Tang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Binyu Yan
View author publications
You can also search for this author in PubMed Google Scholar
Yan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Wang .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen's University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, X., Zeng, P., Tang, C., Wang, P., Yan, B., Wang, Y. (2023). DBTrans: A Dual-Branch Vision Transformer for Multi-Modal Brain Tumor Segmentation. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14223. Springer, Cham. https://doi.org/10.1007/978-3-031-43901-8_48

Download citation

DOI: https://doi.org/10.1007/978-3-031-43901-8_48
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43900-1
Online ISBN: 978-3-031-43901-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)