Abstract
Medical images can be accurately segmented to provide reliable basis for clinical diagnosis and pathology research, and assist doctors to make more accurate diagnosis, as well as deep learning technology can accelerate this process. Convolutional Neural Networks (CNNs) and Transformer have become two mainstream architectures of deep learning in medical image segmentation. However, the Transformer architecture has limited ability to obtain local inductive bias, and the Transformer architecture is at a disadvantage in a small sample data set. Many theories and experiments show that the above problems can be effectively solved by fusing Convolution and Transformer features. In this manuscript, a new U-shaped segmentation model based on Convolution and swin-transformer framework is proposed, which is called CST-UNET. In the encoder part, it combines the advantages of both dilated convolution and Transformer, which can make the model fully obtain semantic inductive bias information and long-term information. At the same time, it has the advantages of fewer parameters and lower Flops. Even if it is trained on a small sample data set, the framework still has strong generalization ability. In addition, on BraTS2021 dataset, the Dice coefficients of ET, TC and WT are 85.46%, 89.38%, 92.35% respectively, and the result of HD95 are 7.95, 5.06 and 4.07 respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., et al.: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci. Data. 4, 170117 (2017). https://doi.org/10.1038/sdata.2017.117
Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., et al.: The multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2015). https://doi.org/10.1109/TMI.2014.2377694
Baid, U., Ghodasara, S., Mohan, S., Bilello, M., Calabrese, E., Colak, E., et al.: The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification (2021). http://arxiv.org/abs/2107.02314
Wang, W., Chen, C., Ding, M., Li, J., Yu, H., Zha, S.: TransBTS: Multimodal Brain Tumor Segmentation using Transformer. arXiv:2103.04430 [cs] (2021)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chen, L.-C., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv:1706.05587 [cs] (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully Convolutional Networks for Semantic Segmentation. arXiv (2015). https://doi.org/10.48550/arXiv.1411.4038
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. arXiv (2016). https://doi.org/10.48550/arXiv.1511.00561
Wu, Y., He, K.: Group Normalization. arXiv:1803.08494 [cs] (2018)
Ioffe, S., Szegedy, C.: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv:1502.03167 [cs] (2015)
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. arXiv:1702.03118 [cs] (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE, Las Vegas, NV, USA (2016). https://doi.org/10.1109/CVPR.2016.90
Liu, Z., et al.: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv:2103.14030 [cs] (2021)
Dosovitskiy, A., et al.: An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929 [cs] (2021)
Peng, Z., et al.: Conformer: Local Features Coupling Global Representations for Visual Recognition. arXiv:2105.03889 [cs] (2021)
Chen, J., et al.: TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv:2102.04306 [cs] (2021)
Hatamizadeh, A., et al.: UNETR: Transformers for 3D Medical Image Segmentation. arXiv:2103.10504 [cs, eess] (2021)
Zhou, H.-Y., Guo, J., Zhang, Y., Yu, L., Wang, L., Yu, Y.: nnFormer: Interleaved Transformer for Volumetric Segmentation. arXiv:2109.03201 [cs] (2022)
Peiris, H., Hayat, M., Chen, Z., Egan, G., Harandi, M.: A Volumetric Transformer for Accurate 3D Tumor Segmentation. arXiv:2111.13300 [cs, eess] (2021)
Wang, Z., Zhang, J., Zhang, X., Chen, P., Wang, B.: Transformer model for functional near-infrared spectroscopy classification. IEEE J. Biomed. Health Inform. 1 (2022). https://doi.org/10.1109/JBHI.2022.3140531
Statistical analysis of multiple significance test methods for differential proteomics. https://doi.org/10.1186/1471-2105-11-S4-P30. Accessed 15 May 2022
Cheng, M.-T., Ma, X.-S., Zhang, J.-Y., Wang, B.: Single photon transport in two waveguides chirally coupled by a quantum emitter. Opt. Express, OE. 24, 19988–19993 (2016). https://doi.org/10.1364/OE.24.019988
Tang, M., Djelouah, A., Perazzi, F., Boykov, Y., Schroers, C.: Normalized Cut Loss for Weakly-supervised CNN Segmentation. http://arxiv.org/abs/1804.01346 (2018)
Azad, R., Fayjie, A.R., Kauffman, C., Ayed, I.B., Pedersoli, M., Dolz, J.: On the Texture Bias for Few-Shot CNN Segmentation (2020). http://arxiv.org/abs/2003.04052
Huo, Y., et al.: Fully automatic liver attenuation estimation combing CNN segmentation and morphological operations. Med. Phys. 46, 3508–3519 (2019). https://doi.org/10.1002/mp.13675
Huang, H., et al.: UNet 3+: A full-scale connected UNet for medical image segmentation. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1055–1059. IEEE, Barcelona, Spain (2020). https://doi.org/10.1109/ICASSP40776.2020.9053405
Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J.: UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation. http://arxiv.org/abs/1912.05074 (2020)
Zhou, Y., Huang, W., Dong, P., Xia, Y., Wang, S.: D-UNet: a dimension-fusion U shape network for chronic stroke lesion segmentation. IEEE/ACM Trans. Comput. Biol. and Bioinf. 18, 940–950 (2021). https://doi.org/10.1109/TCBB.2019.2939522
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. arXiv:1606.06650 [cs] (2016)
Milletari, F., Navab, N., Ahmadi, S.-A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571 (2016). https://doi.org/10.1109/3DV.2016.79
Vaswani, A., et al.: Attention Is All You Need. arXiv:1706.03762 [cs] (2017)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv:2010.04159 [cs] (2021)
Liu, Z., et al.: Video Swin Transformer. arXiv:2106.13230 [cs] (2021)
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: ViViT: A Video Vision Transformer. arXiv:2103.15691 [cs] (2021)
Valanarasu, J.M.J., Oza, P., Hacihaliloglu, I., Patel, V.M.: Medical Transformer: Gated Axial-Attention for Medical Image Segmentation. arXiv:2102.10662 [cs] (2021)
Shen, H., Zhang, Y., Zheng, C., Wang, B., Chen, P.: A cascade graph convolutional network for predicting protein-ligand binding affinity. Int. J. Mol. Sci. 22, 4023 (2021). https://doi.org/10.3390/ijms22084023
Hu, Q., Zhang, J., Chen, P., Wang, B.: Compound identification via deep classification model for electron-ionization mass spectrometry. Int. J. Mass Spectrom. 463, 116540 (2021). https://doi.org/10.1016/j.ijms.2021.116540
Li, J., Su, Z., Geng, J., Yin, Y.: Real-time detection of steel strip surface defects based on improved YOLO detection network. IFAC-PapersOnLine 51, 76–81 (2018). https://doi.org/10.1016/j.ifacol.2018.09.412
Xu, Y., Zhang, Q., Zhang, J., Tao, D.: ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias, vol. 14 (2021)
Tang, Y., et al.: Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis. arXiv:2111.14791 [cs] (2022)
Sundaresan, V., Griffanti, L., Jenkinson, M.: Brain tumour segmentation using a triplanar ensemble of U-Nets on MR images. In: Crimi, A., Bakas, S. (eds.) BrainLes 2020. LNCS, vol. 12658, pp. 340–353. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72084-1_31
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4733. IEEE, Honolulu, HI (2017). https://doi.org/10.1109/CVPR.2017.502
Acknowledgement
This work was supported by the National Natural Science Foundation of China (Nos. 62172004, 62072002, and 61872004), Educational Commission of Anhui Province (No. KJ2019ZD05).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, F. et al. (2022). A 3D Medical Image Segmentation Framework Fusing Convolution and Transformer Features. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2022. Lecture Notes in Computer Science, vol 13393. Springer, Cham. https://doi.org/10.1007/978-3-031-13870-6_63
Download citation
DOI: https://doi.org/10.1007/978-3-031-13870-6_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13869-0
Online ISBN: 978-3-031-13870-6
eBook Packages: Computer ScienceComputer Science (R0)