Abstract
Accurate segmentation of cardiac structures in magnetic resonance imaging (MRI) is essential for reliable diagnosis and management of cardiovascular disease. Although numerous robust models have been proposed, no single segmentation model consistently outperforms others across all cases, and models that excel on one dataset may not achieve similar accuracy on others or when the same dataset is expanded. This study introduces FCTransNet, an ensemble-based computer-aided diagnosis system that leverages the complementary strengths of Vision Transformer (ViT) models (specifically TransUNet, SwinUNet, and SegFormer) to address these challenges. To achieve this, we propose a novel pixel-level fusion technique, the Intelligent Weighted Summation Technique (IWST), which reconstructs the final segmentation mask by integrating the outputs of the ViT models and accounting for their diversity. First, a dedicated U-Net module isolates the region of interest (ROI) from cine MRI images, which is then processed by each ViT to generate preliminary segmentation masks. The IWST subsequently fuses these masks to produce a refined final segmentation. By using a local window around each pixel, IWST captures specific neighborhood details while incorporating global context to enhance segmentation accuracy. Experimental validation on the ACDC dataset shows that FCTransNet significantly outperforms individual ViTs and other deep learning-based methods, achieving a Dice Score (DSC) of 0.985 and a mean Intersection over Union (IoU) of 0.914 in the end-diastolic phase. In addition, FCTransNet maintains high accuracy in the end-systolic phase with a DSC of 0.989 and an IoU of 0.908. These results underscore FCTransNet’s ability to improve cardiac MRI segmentation accuracy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
These data were obtained from https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html
References
Mall PK et al (2023) A comprehensive review of deep neural networks for medical image processing: recent developments and future opportunities. Healthc Anal 4
Calisto MB, Lai-Yuen KS (2020) AdaEn-Net: an ensemble of adaptive 2D–3D fully convolutional networks for medical image segmentation. Neural Netw 126:76–94. https://doi.org/10.1016/j.neunet.2020.03.007
Chakravarty A, Sivaswamy J (2019) RACE-Net: a recurrent neural network for biomedical image segmentation. IEEE J Biomed Health Inform 23(3):1151–1162. https://doi.org/10.1109/JBHI.2018.2852635
Boukhamla A et al (2023) GANs investigation for multimodal medical data interpretation: basic architectures and overview. In: 2023 International Conference on Control, Automation and Diagnosis (ICCAD), pp 01–06. https://doi.org/10.1109/ICCAD57653.2023.10152386.
Conze P-H, Andrade-Miranda G, Singh VK, Jaouen V, Visvikis D (2023) Current and emerging trends in medical image segmentation with deep learning. IEEE Trans Radiat and Plasma Med Sci 7(6):545–569. https://doi.org/10.1109/TRPMS.2023.3265863
Xiao H, Li L, Liu Q, Zhu X, Zhang Q (2023) Transformers in medical image segmentation: a review. Biomed Signal Process Control 84. https://doi.org/10.1016/j.bspc.2023.104791
Wang Z, Zheng J-Q, Voiculescu I (2022) An uncertainty-aware transformer for MRI cardiac semantic segmentation via mean teachers. In: Yang G, Aviles-Rivero A, Roberts M, and Schönlieb CB (eds) Medical Image Understanding and Analysis (MIUA 2022). lecture Notes in Computer Science. Springer International Publishing, Cham, pp 494–507. https://doi.org/10.1007/978-3-031-12053-4_37.
Fan C, Su Q, Xiao Z, Su H, Hou A, Luan B (2023) ViT-FRD: a vision transformer model for cardiac mri image segmentation based on feature recombination distillation. IEEE Access 11:129763–129772. https://doi.org/10.1109/ACCESS.2023.3302522
Azad R et al (2024) Advances in medical image analysis with vision transformers: a comprehensive review. Med Image Anal 91. https://doi.org/10.1016/j.media.2023.103000
Chen J, Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille AL, Zhou Y (2021) TransUNet: transformers make strong encoders for medical image segmentation. Available at http://arxiv.org/abs/2102.04306. Accessed 3 Mar 2023
Hatamizadeh A, Tang Y, Nath V, Yang D, Myronenko A, Landman B, Roth HR, Xu D (2022) UNETR: transformers for 3d medical image segmentation. In: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp 1748–1758. https://doi.org/10.1109/WACV51458.2022.00181
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2023) Swin-Unet: unet-like pure transformer for medical image segmentation. In: Karlinsky L, Michaeli T, Nishino K (eds) Computer Vision – ECCV 2022 Workshops. Lecture notes in computer science. Springer Nature Switzerland, Cham, pp 205–218. https://doi.org/10.1007/978-3-031-25066-8_9
Hatamizadeh A, Nath V, Tang Y, Yang D, Roth HR, Xu D (2022) Swin UNETR: swin transformers for semantic segmentation of brain tumors in MRI images. In: Crimi A, Bakas S (eds) Brainlesion: glioma, multiple sclerosis, stroke and traumatic brain injuries. Springer International Publishing, Cham, pp 272–284. https://doi.org/10.1007/978-3-031-08999-2_22.
Ammar LB, Gasmi K, Ltaifa IB (2024) ViT-TB: ensemble learning based ViT model for tuberculosis recognition. Cybern Syst 55(3):634–653. https://doi.org/10.1080/01969722.2022.2162736
Qiu J, Mitra J, Ghose S, Dumas C, Yang J, Sarachan B, Judson MA (2024) A multichannel CT and radiomics-guided CNN-ViT (RadCT-CNNViT) ensemble network for diagnosis of pulmonary sarcoidosis. Diagnostics 14(10):1049. https://doi.org/10.3390/diagnostics14101049.
Xu G, Wu X, Zhang X, and He X (2021) LeViT-UNet: make faster encoders with transformer for medical image segmentation. arXiv: https://doi.org/10.48550/arXiv.2107.08623 . Accessed 3 Mar 2023.
Graham B et al (2021) LeViT: a vision transformer in ConvNet’s clothing for faster inference. arXiv: https://doi.org/10.48550/arXiv.2104.01136.
Yang X, Tian X (2022) TransNUNet: using attention mechanism for whole heart segmentation. In: 2022 IEEE 2nd International Conference on Power, Electronics and Computer Applications (ICPECA), pp 553–556. https://doi.org/10.1109/ICPECA53709.2022.9719101
Gao Y, Zhou M, and Metaxas DN (2021) UTNet: a hybrid transformer architecture for medical image segmentation, in medical image computing and computer assisted intervention – MICCAI. In: de Bruijne M, PC, Cattin PC, Cotin S,Padoy N, Speidel S, Zheng Y, and Essert C (eds) in lecture notes in computer science. Cham: Springer International Publishing, pp. 61–71. https://doi.org/10.1007/978-3-030-87199-46.
Gao Y, Zhou M, Liu D, Yan Z, Zhang S, and Metaxas DN (2023) A data-scalable transformer for medical image segmentation: architecture, model efficiency, and benchmark. arXiv: https://doi.org/10.48550/arXiv.2203.00131. Accessed 01 Aug 2023.
Deng K et al (2021) TransBridge: a lightweight transformer for left ventricle segmentation in echocardiography, in simplifying medical ultrasound. In: Noble JA, Aylward S, Grimwood A, Min Z, Lee S-L, and Hu Y (eds) in lecture notes in computer science. Cham: Springer International Publishing, pp. 63–72. https://doi.org/10.1007/978-3-030-87583-1_7.
Wu Y et al (2022) D-Former: a U-shaped dilated transformer for 3D medical image segmentation. arXiv: https://doi.org/10.48550/arXiv.2201.00462.
Aghapanah H et al (2024) CardSegNet: an adaptive hybrid CNN-vision transformer model for heart region segmentation in cardiac MRI. Comput Med Imaging Graph 115. https://doi.org/10.1016/j.compmedimag.2024.102382
Huang X, Deng Z, Li D, and Yuan X (2021) MISSFormer: an effective medical image segmentation transformer. arXiv:https://doi.org/10.48550/arXiv.2109.07162.
Zhou H-Y, Guo J, Zhang Y, Yu L, Wang L, and Yu Y (2022) nnFormer: interleaved transformer for volumetric segmentation. arXiv:https://doi.org/10.48550/arXiv.2109.03201.
Liu D et al (2022) TransFusion: multi-view divergent fusion for medical image segmentation with transformers,” in medical image computing and computer assisted intervention – MICCAI. In: Wang L, Dou Q, Fletcher PT, Speidel S, and Li S (eds) in lecture notes in computer science. Cham: Springer Nature Switzerland, pp. 485–495. https://doi.org/10.1007/978-3-031-16443-9_47.
Ji Y et al (2021) Multi-compound Transformer for Accurate Biomedical Image Segmentation, in medical image computing and computer assisted intervention – MICCAI. In: de Bruijne M, Cattin PC, Cotin S, Padoy N, Speidel S, Zheng Y, and Essert C (eds) in lecture notes in computer science. Cham: Springer International Publishing, pp. 326–336. https://doi.org/10.1007/978-3-030-87193-2_31.
Li B, Yang T, Zhao X (2023) NVTrans-UNet: Neighborhood vision transformer based U-Net for multi-modal cardiac MR image segmentation. J Appl Clin Med Phys 24(3). https://doi.org/10.1002/acm2.13908
Yang R, Liu K, Liang Y (2024) A fusion-attention swin transformer for cardiac MRI image segmentation. IET Image Proc 18(1):105–115. https://doi.org/10.1049/ipr2.12936
Luo X, Hu M, Song S, Wang G, and Zhang S (2021) Semi-supervised medical image segmentation via cross teaching between CNN and transformer. arXiv: https://doi.org/10.48550/arXiv.2112.04894. Accessed 02 Aug. 2023
Mazher M et al (2024) Self-supervised spatial–temporal transformer fusion based federated framework for 4D cardiovascular image segmentation. Inf Fusion 106. https://doi.org/10.1016/j.inffus.2024.102256
Zhou T, Cheng Q, Lu H, Li Q, Zhang X, Qiu S (2023) Deep learning methods for medical image fusion: a review. Comput Biol Med 160. https://doi.org/10.1016/j.compbiomed.2023.106959
Hermessi H, Mourali O, Zagrouba E (2021) Multimodal medical image fusion review: theoretical background and recent advances. Signal Process 183. https://doi.org/10.1016/j.sigpro.2021.108036
Sahu A, Bhateja V, Krishn A, and Himanshi (2014) Medical image fusion with laplacian pyramids, in 2014 international conference on medical imaging, m-health and emerging communication systems(MedCom), pp. 448–453. https://doi.org/10.1109/MedCom.2014.7006050
Bhavana V, Krishnappa HK (2015) Multi-modality medical image fusion using discrete wavelet transform. Procedia Comput Sci 70:625–631.https://doi.org/10.1016/j.procs.2015.10.057
Tang L, Li L, Qian J, Zhang J, Pan J-S (2016) NSCT-based multimodal medical image fusion with sparse representation and pulse coupled neural network. J Inf Hiding Multim Signal Process 7(6):1306–1316
Das S, Kundu MK (2013) A neuro-fuzzy approach for medical image fusion. IEEE Trans Biomed Eng 60(12):3347–3353.https://doi.org/10.1109/TBME.2013.2282461
Shahdoosti HR, Mehrabi A (2018) Multimodal image fusion using sparse representation classification in tetrolet domain. Digital Signal Process 79:9–22. https://doi.org/10.1016/j.dsp.2018.04.002
Prakash O, Park CM, Khare A, Jeon M, Gwak J (2019) Multiscale fusion of multimodal medical images using lifting scheme based biorthogonal wavelet transform. Optik 182:995–1014. https://doi.org/10.1016/j.ijleo.2018.12.028
Singh S et al (2023) A review of image fusion: methods, applications and performance metrics. Digital Signal Process 137. https://doi.org/10.1016/j.dsp.2023.104020
Li S, Kwok JT, Wang Y (2002) Using the discrete wavelet frame transform to merge landsat TM and SPOT panchromatic images. Inf Fusion 3(1):17–23. https://doi.org/10.1016/S1566-2535(01)00037-9
Ronneberger O, Fischer P, and Brox T (2015) U-Net: convolutional networks for biomedical image segmentation, in medical image computing and computer-assisted intervention – MICCAI. In: Navab N, Hornegger J, Wells WM, and Frangi AF (eds) in lecture notes in computer science. Cham: Springer International Publishing, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Gao L, Zhang L, Liu C, Wu S (2020) Handling imbalanced medical image data: a deep-learning-based one-class classification approach. Artif Intell Med 108. https://doi.org/10.1016/j.artmed.2020.101935
Simonyan K and Zisserman A (2015) Very Deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/arXiv.1409.1556, Accessed 18 March 2022. [Online].
Xie E, Wang W, Yu Z, Anandkumar A, Alvarez JM, and Luo P (2021) “SegFormer: Simple and efficient design for semantic segmentation with transformers,” in advances in neural information processing systems, Curran Associates, Inc. pp. 12077–12090. Accessed: 6 Aug 2023. [Online]. Available: https://proceedings.neurips.cc/paper/2021/hash/64f1f27bf1b4ec22924fd0acb550c235-Abstract.html
Chaoyang Z, Shibao S, Wenmao H, Pengcheng Z (2024) FDR-TransUNet: a novel encoder-decoder architecture with vision transformer for improved medical image segmentation. Comput Biol Med 169
Chong Y, Xie N, Liu X, Pan S (2023) P-TransUNet: an improved parallel network for medical image segmentation. BMC Bioinformatics 24(1):285. https://doi.org/10.1186/s12859-023-05409-7
Liu Z et al (2023) “Swin transformer: hierarchical vision transformer using shifted windows,” presented at the proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022. Accessed 9 Aug. 2023. [Online]. Available: https://openaccess.thecvf.com/content/ICCV2021/html/Liu_Swin_Transformer_Hierarchical_Vision_Transformer_Using_Shifted_Windows_ICCV_2021_paper
Milletari F, Navab N, and Ahmadi S-A (2016) “V-Net: fully convolutional neural networks for volumetric medical image segmentation,” in 2016 fourth international conference on 3D Vision (3DV), pp. 565–571. https://doi.org/10.1109/3DV.2016.79
Bernard O et al (2018) Deep learning techniques for automatic MRI Cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans Med Imaging 37(11):2514–2525.https://doi.org/10.1109/TMI.2018.2837502
Deng J, Dong W, Socher R, Li L-J, Li K, and Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database, In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Xu S and Quan H (2021) ECT-NAS: searching efficient CNN-Transformers architecture for medical image segmentation, In 2021 IEEE international conference on bioinformatics and biomedicine (BIBM), pp. 1601–1604. https://doi.org/10.1109/BIBM52615.2021.9669734
Chen Y, Lu X, Xie Q (2023) ATFormer: Advanced transformer for medical image segmentation. Biomed Signal Process Control 85. https://doi.org/10.1016/j.bspc.2023.105079
Li J et al (2023) MCRformer: morphological constraint reticular transformer for 3D medical image segmentation. Expert Syst Appl 232. https://doi.org/10.1016/j.eswa.2023.120877
Isensee F, Jaeger PF, Full PM, Wolf I,Engelhardt S, and Maier-Hein MH, “automatic cardiac disease assessment on cine-MRI via time-series segmentation and domain specific features,” in statistical atlases and computational models of the heart.ACDC and MMWHS challenges. In: Pop M, Sermesant M, Jodoin P-M, Lalande A, Zhuang X, Yang G, Young A, and Bernard O (eds) in lecture notes in computer science. Cham: Springer International Publishing, pp. 120–129. https://doi.org/10.1007/978-3-319-75541-0_13.
Baumgartner CF, Koch LM, Pollefeys M, and Konukoglu E (2018) “An exploration of 2D and 3D deep learning techniques for cardiac MR Image segmentation,” in statistical atlases and computational models of the heart. ACDC and MMWHS challenges. In: Pop M, Sermesant M, Jodoin P-M, Lalande A, Zhuang X, Yang G, Young A, and Bernard O (eds) in lecture notes in computer science. Cham: Springer International Publishing, pp. 111–119. https://doi.org/10.1007/978-3-319-75541-0_12
Zotti C, Luo Z, Lalande A, Jodoin P-M (2019) Convolutional neural network with shape prior applied to cardiac MRI segmentation. IEEE J Biomed Health Inform 23(3):1119–1128. https://doi.org/10.1109/JBHI.2018.2865450
Painchaud N, Skandarani Y, Judge T, Bernard O, A. Lalande A, and Jodoin P-M (2019) Cardiac MRI segmentation with strong anatomical guarantees, in medical image computing and computer assisted intervention – MICCAI. In: Shen D, Liu T, Peters TM, Staib LH, Essert C, Zhou S, Yap PT, and A. Khan A (eds) in lecture notes in computer science. Cham: Springer International Publishing, pp. 632–640. https://doi.org/10.1007/978-3-030-32245-8_70
Khened M, Kollerathu VA, Krishnamurthi G (2019) Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Med Image Anal 51:21–45. https://doi.org/10.1016/j.media.2018.10.004
Simantiris G, Tziritas G (2020) Cardiac MRI segmentation with a dilated CNN incorporating domain-specific constraints. IEEE J Sel Topics Signal Process 14(6):1235–1243. https://doi.org/10.1109/JSTSP.2020.3013351
da Silva IFS, Silva AC, de Paiva AC, Gattass M (2022) A cascade approach for automatic segmentation of cardiac structures in short-axis cine-MR images using deep neural networks. Expert Syst Appl 197.https://doi.org/10.1016/j.eswa.2022.116704
Dong S et al (2022) DeU-Net 2.0: Enhanced deformable U-Net for 3D cardiac cine MRI segmentation. Med Image Anal 78. https://doi.org/10.1016/j.media.2022.102389
Wang K-N et al (2022) AWSnet: an auto-weighted supervision attention network for myocardial scar and edema segmentation in multi-sequence cardiac magnetic resonance images. Med Image Anal 77. https://doi.org/10.1016/j.media.2022.102362
Kim D, Kim J (2023) Vision transformer compression and architecture exploration with efficient embedding space search. In: Wang L, Gall J, Chin T-J, Sato I, Chellappa R (eds) computer vision – ACCV 2022. Lecture Notes in Computer Science. Springer Nature Switzerland, Cham, pp 524–540. https://doi.org/10.1007/978-3-031-26313-2_32
Alqahtani A, Xie X, and Jones MW (2021) Literature review of deep network compression. Informatics 8(4). https://doi.org/10.3390/informatics8040077
Acknowledgements
We are grateful to the Direction Generale de la Recherche Scientifique et du Developpement Technologique (DGRSDT) which kindly supported this research, as well as to the Laboratoire de Gestion Electronique du Document (LabGED) where this study was conducted.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts interest
The authors declare no conflicts of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Boukhamla, A., Azizi, N. & Belhaouari, S.B. Intelligent mask image reconstruction for cardiac image segmentation through local–global fusion. Appl Intell 55, 257 (2025). https://doi.org/10.1007/s10489-024-06085-7
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06085-7