SPCTNet: A Series-Parallel CNN and Transformer Network for 3D Medical Image Segmentation

Yu, Bin; Zhou, Quan; Zhang, Xuming

doi:10.1007/978-981-99-8850-1_31

Bin Yu¹¹,
Quan Zhou¹¹ &
Xuming Zhang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14473))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

173 Accesses

Abstract

Medical image segmentation is crucial for lesion localization and surgical navigation. Recent advancements in medical image segmentation have been driven by Convolutional Neural Networks (CNNs) and Transformers. However, CNNs have limitations in capturing long-range dependencies due to their weight sharing and localized receptive fields, posing challenges in handling varying organ shapes. While Transformers offer an alternative with global receptive fields, their spatial and computational complexity is particularly high, especially for 3D medical images. To address this issue, we propose a novel series-parallel network that combines convolution and self-attention for 3D medical image segmentation. We utilize a serial 3D CNN as the encoder to extract multi-level feature maps, which are fused via a feature pyramid network. Subsequently, we adopt four parallel Transformer branches to capture global features. To efficiently model long-range information, we introduce patch self-attention, which divides the input into non-overlapping patches and computes attention between corresponding pixels across patches. Experimental evaluations on 3D MRI prostate and left atrial segmentation tasks confirm the superior performance of our network compared to other CNN and Transformer-based networks. Notably, our method achieves higher segmentation accuracy and faster inference speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention — MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Adams, R., Bischof, L.: Seeded region growing. IEEE Trans. Pattern Anal. Mach. Intell., 641–647 (1994)
Google Scholar
Beucher, S., Meyer, F.: The morphological approach to segmentation: the watershed transformation. In: Mathematical Morphology in Image Processing, pp. 433–481. CRC Press (2018)
Google Scholar
Kass, M., Witkin, A., Terzopoulos, D.: Snakes: active contour models. Int. J. Comput. Vis., 321–331 (1988)
Google Scholar
Oktay, O., Schlemper, J., et al.: Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 (2018)
Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: a nested U-Net architecture for medical image segmentation. In: Stoyanov, D., et al. (eds.) Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. LNCS, vol. 11045, pp. 3–11. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00889-5_1
Chapter Google Scholar
Xiao, X., Lian, S., Luo, Z., et al.: Weighted res-unet for high-quality retina vessel segmentation. In: 2018 9th International Conference on Information Technology in Medicine and Education (ITME), pp. 327–331. IEEE (2018)
Google Scholar
Huang, H., Lin, L., et al.: UNet 3+: a full-scale connected U-Net for medical image segmentation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059. IEEE (2020)
Google Scholar
Prajit, R., Niki, P., et al.: Standalone self-attention in vision models. arXiv preprint arXiv:1906.05909 (2019)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Chen, J., Lu, Y., Yu, Q., et al.: TransUNet: transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306 (2021)
Hatamizadeh, A., Tang, Y., Nath, V., et al.: UNETR: transformers for 3D medical image segmentation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 574–584 (2022)
Google Scholar
Xie, Y., Zhang, J., Shen, C., Xia, Y.: CoTr: efficiently bridging CNN and transformer for 3D medical image segmentation. In: de Bruijne, M., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. LNCS, vol. 12903, pp. 171–180. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_16
Chapter Google Scholar
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10224–10233 (2021)
Google Scholar
Cao, H., Wang, Y., Chen, J., et al: Swin-Unet: unet-like pure transformer for medical image segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) European Conference on Computer Vision, pp. 205–218. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25066-8_9
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Mehta, S., Rastegari, M.: MobileViT: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021)
Xie, S., Girshick, R., Dollár P., et al.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Yang, X., Bian, C., Yu, L., Ni, D., Heng, P.-A.: Hybrid loss guided convolutional networks for whole heart parsing. In: Pop, M., et al. (eds.) Statistical Atlases and Computational Models of the Heart. ACDC and MMWHS Challenges. LNCS, vol. 10663, pp. 215–223. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75541-0_23
Chapter Google Scholar
Geert, L., Oscar, D., Jelle, B., Nico, K., Henkjan, H.: ProstateX challenge data. Cancer Imaging Arch. (2017). https://doi.org/10.7937/K9TCIA.2017.MURS5CL
Article Google Scholar
Xiong, Z., et al.: A global benchmark of algorithms for segmenting the left atrium from late gadolinium-enhanced cardiac magnetic resonance imaging. Med. Image Anal. 67, 101832 (2021)
Article Google Scholar
MindSpore. https://www.mindspore.cn
Yu, L., Wang, S., Li, X., Fu, C.-W., Heng, P.-A.: Uncertainty-aware self-ensembling model for semi-supervised 3D left atrium segmentation. In: Shen, D., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. LNCS, vol. 11765, pp. 605–613. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_67
Chapter Google Scholar
Chang, H.H., Zhuang, A.H., Valentino, D.J., et al.: Performance measure characterization for evaluating neuroimage segmentation algorithms. Neuroimage 47(1), 122–135 (2009)
Google Scholar
Litjens, G., Toth, R., et al.: Evaluation of prostate segmentation algorithms for MRI: the PROMISE12 challenge. Med. Image Anal. 18(2), 359–373 (2014)
Google Scholar
Wang, W., Chen, C., Ding, M., Yu, H., Zha, S., Li, J.: TransBTS: multimodal brain tumor segmentation using transformer. In: de Bruijne, M., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. LNCS, vol. 12901, pp. 109–119. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87193-2_11
Chapter Google Scholar
Hatamizadeh, A., Nath, V., Tang, Y., et al.: Swin unetr: swin transformers for semantic segmentation of brain tumors in mri images. In: Crimi, A., Bakas, S. (eds.) International MICCAI Brainlesion Workshop, pp. 272–284. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-08999-2_22
Zhou, H., Guo, J., Zhang, Y., et al.: nnFormer: interleaved transformer for volumetric segmentation. arXiv preprint arXiv:2109.03201 (2021)

Download references

Acknowledgment

This work is sponsored by the National Natural Science Foundation of China (Grant No. 61871440), China Postdoctoral Science Foundation (Grant No. 2023M731204), and CAAI-Huawei MindSpore Open Fund.

Author information

Authors and Affiliations

Department of Biomedical Engineering, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
Bin Yu, Quan Zhou & Xuming Zhang

Authors

Bin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Quan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xuming Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuming Zhang .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Jian Pei
Shanghai Jiao Tong Univeristy, Shanghai, China
Guangtao Zhai
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, B., Zhou, Q., Zhang, X. (2024). SPCTNet: A Series-Parallel CNN and Transformer Network for 3D Medical Image Segmentation. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14473. Springer, Singapore. https://doi.org/10.1007/978-981-99-8850-1_31

Download citation

DOI: https://doi.org/10.1007/978-981-99-8850-1_31
Published: 04 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8849-5
Online ISBN: 978-981-99-8850-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics