A 3D Medical Image Segmentation Framework Fusing Convolution and Transformer Features

Zhu, Fazhan; Lv, Jiaxing; Lu, Kun; Wang, Wenyan; Cong, Hongshou; Zhang, Jun; Chen, Peng; Zhao, Yuan; Wu, Ziheng

doi:10.1007/978-3-031-13870-6_63

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13393))

Included in the following conference series:

International Conference on Intelligent Computing

1943 Accesses

Abstract

Medical images can be accurately segmented to provide reliable basis for clinical diagnosis and pathology research, and assist doctors to make more accurate diagnosis, as well as deep learning technology can accelerate this process. Convolutional Neural Networks (CNNs) and Transformer have become two mainstream architectures of deep learning in medical image segmentation. However, the Transformer architecture has limited ability to obtain local inductive bias, and the Transformer architecture is at a disadvantage in a small sample data set. Many theories and experiments show that the above problems can be effectively solved by fusing Convolution and Transformer features. In this manuscript, a new U-shaped segmentation model based on Convolution and swin-transformer framework is proposed, which is called CST-UNET. In the encoder part, it combines the advantages of both dilated convolution and Transformer, which can make the model fully obtain semantic inductive bias information and long-term information. At the same time, it has the advantages of fewer parameters and lower Flops. Even if it is trained on a small sample data set, the framework still has strong generalization ability. In addition, on BraTS2021 dataset, the Dice coefficients of ET, TC and WT are 85.46%, 89.38%, 92.35% respectively, and the result of HD95 are 7.95, 5.06 and 4.07 respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TransDeepLab: Convolution-Free Transformer-Based DeepLab v3+ for Medical Image Segmentation

A Novel Deep Learning Model for Medical Image Segmentation with Convolutional Neural Network and Transformer

Article 04 September 2023

D-former: a U-shaped Dilated Transformer for 3D medical image segmentation

Article 06 October 2022

References

Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., et al.: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data. 4, 170117 (2017). https://doi.org/10.1038/sdata.2017.117
Article Google Scholar
Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., et al.: The multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2015). https://doi.org/10.1109/TMI.2014.2377694
Article Google Scholar
Baid, U., Ghodasara, S., Mohan, S., Bilello, M., Calabrese, E., Colak, E., et al.: The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification (2021). http://arxiv.org/abs/2107.02314
Wang, W., Chen, C., Ding, M., Li, J., Yu, H., Zha, S.: TransBTS: Multimodal Brain Tumor Segmentation using Transformer. arXiv:2103.04430 [cs] (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv:1706.05587 [cs] (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully Convolutional Networks for Semantic Segmentation. arXiv (2015). https://doi.org/10.48550/arXiv.1411.4038
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. arXiv (2016). https://doi.org/10.48550/arXiv.1511.00561
Wu, Y., He, K.: Group Normalization. arXiv:1803.08494 [cs] (2018)
Ioffe, S., Szegedy, C.: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv:1502.03167 [cs] (2015)
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. arXiv:1702.03118 [cs] (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90
Liu, Z., et al.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv:2103.14030 [cs] (2021)
Dosovitskiy, A., et al.: An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929 [cs] (2021)
Peng, Z., et al.: Conformer: Local Features Coupling Global Representations for Visual Recognition. arXiv:2105.03889 [cs] (2021)
Chen, J., et al.: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv:2102.04306 [cs] (2021)
Hatamizadeh, A., et al.: UNETR: Transformers for 3D Medical Image Segmentation. arXiv:2103.10504 [cs, eess] (2021)
Zhou, H.-Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y.: nnFormer: Interleaved Transformer for Volumetric Segmentation. arXiv:2109.03201 [cs] (2022)
Peiris, H., Hayat, M., Chen, Z., Egan, G., Harandi, M.: A Volumetric Transformer for Accurate 3D Tumor Segmentation. arXiv:2111.13300 [cs, eess] (2021)
Wang, Z., Zhang, J., Zhang, X., Chen, P., Wang, B.: Transformer model for functional near-infrared spectroscopy classification. IEEE J. Biomed. Health Inform. 1 (2022). https://doi.org/10.1109/JBHI.2022.3140531
Statistical analysis of multiple significance test methods for differential proteomics. https://doi.org/10.1186/1471-2105-11-S4-P30. Accessed 15 May 2022
Cheng, M.-T., Ma, X.-S., Zhang, J.-Y., Wang, B.: Single photon transport in two waveguides chirally coupled by a quantum emitter. Opt. Express, OE. 24, 19988–19993 (2016). https://doi.org/10.1364/OE.24.019988
Tang, M., Djelouah, A., Perazzi, F., Boykov, Y., Schroers, C.: Normalized Cut Loss for Weakly-supervised CNN Segmentation. http://arxiv.org/abs/1804.01346 (2018)
Azad, R., Fayjie, A.R., Kauffman, C., Ayed, I.B., Pedersoli, M., Dolz, J.: On the Texture Bias for Few-Shot CNN Segmentation (2020). http://arxiv.org/abs/2003.04052
Huo, Y., et al.: Fully automatic liver attenuation estimation combing CNN segmentation and morphological operations. Med. Phys. 46, 3508–3519 (2019). https://doi.org/10.1002/mp.13675
Article Google Scholar
Huang, H., et al.: UNet 3+: A full-scale connected UNet for medical image segmentation. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059. IEEE, Barcelona, Spain (2020). https://doi.org/10.1109/ICASSP40776.2020.9053405
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. http://arxiv.org/abs/1912.05074 (2020)
Zhou, Y., Huang, W., Dong, P., Xia, Y., Wang, S.: D-UNet: a dimension-fusion U shape network for chronic stroke lesion segmentation. IEEE/ACM Trans. Comput. Biol. and Bioinf. 18, 940–950 (2021). https://doi.org/10.1109/TCBB.2019.2939522
Article Google Scholar
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. arXiv:1606.06650 [cs] (2016)
Milletari, F., Navab, N., Ahmadi, S.-A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016). https://doi.org/10.1109/3DV.2016.79
Vaswani, A., et al.: Attention Is All You Need. arXiv:1706.03762 [cs] (2017)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv:2010.04159 [cs] (2021)
Liu, Z., et al.: Video Swin Transformer. arXiv:2106.13230 [cs] (2021)
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: ViViT: A Video Vision Transformer. arXiv:2103.15691 [cs] (2021)
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical Transformer: Gated Axial-Attention for Medical Image Segmentation. arXiv:2102.10662 [cs] (2021)
Shen, H., Zhang, Y., Zheng, C., Wang, B., Chen, P.: A cascade graph convolutional network for predicting protein-ligand binding affinity. Int. J. Mol. Sci. 22, 4023 (2021). https://doi.org/10.3390/ijms22084023
Article Google Scholar
Hu, Q., Zhang, J., Chen, P., Wang, B.: Compound identification via deep classification model for electron-ionization mass spectrometry. Int. J. Mass Spectrom. 463, 116540 (2021). https://doi.org/10.1016/j.ijms.2021.116540
Article Google Scholar
Li, J., Su, Z., Geng, J., Yin, Y.: Real-time detection of steel strip surface defects based on improved YOLO detection network. IFAC-PapersOnLine 51, 76–81 (2018). https://doi.org/10.1016/j.ifacol.2018.09.412
Article Google Scholar
Xu, Y., Zhang, Q., Zhang, J., Tao, D.: ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias, vol. 14 (2021)
Google Scholar
Tang, Y., et al.: Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis. arXiv:2111.14791 [cs] (2022)
Sundaresan, V., Griffanti, L., Jenkinson, M.: Brain tumour segmentation using a triplanar ensemble of U-Nets on MR images. In: Crimi, A., Bakas, S. (eds.) BrainLes 2020. LNCS, vol. 12658, pp. 340–353. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72084-1_31
Chapter Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733. IEEE, Honolulu, HI (2017). https://doi.org/10.1109/CVPR.2017.502

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China (Nos. 62172004, 62072002, and 61872004), Educational Commission of Anhui Province (No. KJ2019ZD05).

Author information

Authors and Affiliations

School of Electrical and Information Engineering, Anhui University of Technology, Ma’anshan, 243032, Anhui, China
Fazhan Zhu, Jiaxing Lv, Kun Lu, Wenyan Wang, Hongshou Cong, Yuan Zhao & Ziheng Wu
Key Laboratory of Metallurgical Emission Reduction and Resources Recycling, (Anhui University of Technology), Ministry of Education, Ma’anshan, 243002, China
Kun Lu, Wenyan Wang, Yuan Zhao & Ziheng Wu
School of Materials Science and Engineering, Anhui University of Technology, Ma’anshan, 243032, Anhui, China
Wenyan Wang
School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, Anhui, China
Jun Zhang
National Engineering Research Center for Agro-Ecological Big Data Analysis and Application, School of Internet and Institutes of Physical Science and Information Technology, Anhui Univesity, Hefei, 230601, Anhui, China
Peng Chen

Authors

Fazhan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxing Lv
View author publications
You can also search for this author in PubMed Google Scholar
Kun Lu
View author publications
You can also search for this author in PubMed Google Scholar
Wenyan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hongshou Cong
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ziheng Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziheng Wu .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Xi'an Polytechnic University, Xi'an, China
Junfeng Jing
The University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Polytecnic of Bari, Bari, Italy
Vitoantonio Bevilacqua
Liverpool John Moores University, Liverpool, UK
Abir Hussain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, F. et al. (2022). A 3D Medical Image Segmentation Framework Fusing Convolution and Transformer Features. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2022. Lecture Notes in Computer Science, vol 13393. Springer, Cham. https://doi.org/10.1007/978-3-031-13870-6_63

Download citation

DOI: https://doi.org/10.1007/978-3-031-13870-6_63
Published: 15 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13869-0
Online ISBN: 978-3-031-13870-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics