Skip to main content
Log in

Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Multi-modal medical image fusion (MMIF) has found wide application in the field of disease diagnosis and surgical guidance. Despite the popularity of deep learning (DL)-based fusion methods, these DL algorithms cannot provide satisfactory fusion performance due to the difficulty in capturing the local information and the long-range dependencies effectively. To address these issues, this paper has presented an unsupervised MMIF method by combining a densely-connected high-resolution network (DHRNet) with a hybrid transformer. In this method, the local features are firstly extracted from the source image using the DHRNet. Then these features are input into the fine-grained attention module in the hybrid transformer to produce the global features by exploring their long-range dependencies. The local and global features are fused by the projection attention module in the hybrid transformer. Finally, based on the fused features, the fused result is reconstructed by the decoder network. The presented network is trained using an unsupervised loss function including edge preservation value, structural similarity, sum of the correlations of differences and structural tensor. Experiments on various multi-modal medical images show that, compared with several traditional and DL-based fusion methods, the presented method can generate visually better fused results and provide better quantitative metrics values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Yang Y, Que Y, Huang S, Lin P (2016) Multimodal sensor medical Image fusion based on Type-2 fuzzy logic in NSCT domain. IEEE Sens J 16:3735–3745

    Article  Google Scholar 

  2. Daneshzand M, Zoroofi RA, Faezipour M (2014) MR image assisted drug delivery in respiratory tract and trachea tissues based on an enhanced level set method. In: Proceedings of the 2014 zone 1 conference of the American Society for engineering education. IEEE, pp 1–7

  3. James AP, Dasarathy BV (2014) Medical image fusion: a survey of the state of the art. Inf Fusion 19:4–19

    Article  Google Scholar 

  4. Zong J, Qiu T (2017) Medical image fusion based on sparse representation of classified image patches. Biomed Signal Process Control 34:195–205

    Article  Google Scholar 

  5. Mihaylova L, Faouzi E, Klein L (2012) Sensor and data fusion: taxonomy, challenges and applications. Handbook on soft computing for video surveillance. Chapman and Hall/CRC, Boca Raton, pp 155–200

    Google Scholar 

  6. Du J, Li W, Xiao B (2017) Anatomical-Functional image fusion by information of interest in local Laplacian filtering domain. IEEE Trans Image Process 26:5855–5866

    Article  MathSciNet  Google Scholar 

  7. Zhu Z, Zheng M, Qi G et al (2019) A phase congruency and local laplacian energy based multi-modality medical image fusion method in NSCT domain. IEEE Access 7:20811–20824

    Article  Google Scholar 

  8. Liu Y, Liu S, Wang Z (2015) A general framework for image fusion based on multi-scale transform and sparse representation. Inf Fusion 24:147–164

    Article  Google Scholar 

  9. Burt P, Adelson E (1983) The Laplacian pyramid as a compact image code. IEEE Trans Commun 31:532–540

    Article  Google Scholar 

  10. Yang S, Wang M, Jiao L et al (2010) Image fusion based on a new contourlet packet. Inf Fusion 11:78–84

    Article  Google Scholar 

  11. Da Cunha AL, Zhou J, Do MN (2006) The nonsubsampled contourlet transform: theory, design, and applications. IEEE Trans Image Process 15:3089–3101

    Article  Google Scholar 

  12. Candès E, Demanet L, Donoho D, Ying L (2006) Fast discrete curvelet transforms. Multiscale Model Simul 5:861–899

    Article  MathSciNet  MATH  Google Scholar 

  13. Singh R, Khare A (2014) Fusion of multimodal medical images using daubechies complex wavelet transform—a multiresolution approach. Inf Fusion 19:49–60

    Article  Google Scholar 

  14. Yu B, Jia B, Ding L et al (2016) Hybrid dual-tree complex wavelet transform and support vector machine for digital multi-focus image fusion. Neurocomputing 182:1–9

    Article  Google Scholar 

  15. Li X, Zhou F, Tan H et al (2021) Multi-focus image fusion based on nonsubsampled contourlet transform and residual removal. Signal Process 184:108062

    Article  Google Scholar 

  16. Liu Z, Song Y, Sheng VS et al (2019) MRI and PET image fusion using the nonparametric density model and the theory of variable-weight. Comput Methods Programs Biomed 175:73–82

    Article  Google Scholar 

  17. Wang S, Meng J, Zhou Y et al (2021) Polarization image fusion algorithm using NSCT and CNN. J Russ Laser Res 42:443–452

    Article  Google Scholar 

  18. Shreyamsha Kumar BK (2015) Image fusion based on pixel significance using cross bilateral filter. Signal Image Video Process 9:1193–1204

    Article  Google Scholar 

  19. Li S, Kang X, Jianwen Hu (2013) Image fusion with guided filtering. IEEE Trans Image Process 22:2864–2875

    Article  Google Scholar 

  20. Liu Z, Chai Y, Yin H et al (2017) A novel multi-focus image fusion approach based on image decomposition. Inf Fusion 35:102–116

    Article  Google Scholar 

  21. Yin H (2011) Multimodal image fusion with joint sparsity model. Opt Eng 50:067007

    Article  Google Scholar 

  22. Li S, Yin H, Fang L (2012) Group-sparse representation with dictionary learning for medical image denoising and fusion. IEEE Trans Biomed Eng 59:3450–3459

    Article  Google Scholar 

  23. Yin H (2015) Sparse representation with learned multiscale dictionary for image fusion. Neurocomputing 148:600–610

    Article  Google Scholar 

  24. Hou R, Zhou D, Nie R et al (2020) VIF-Net: an unsupervised framework for infrared and visible image fusion. IEEE Trans Comput Imaging 6:640–651

    Article  Google Scholar 

  25. Li H, Wu X (2019) DenseFuse: a fusion approach to infrared and visible images. IEEE Trans Image Process 28:2614–2623

    Article  MathSciNet  Google Scholar 

  26. Li H, Wu X, Kittler J (2021) RFN-Nest: an end-to-end residual fusion network for infrared and visible images. Inf Fusion 73:72–86

    Article  Google Scholar 

  27. Raza A, Liu J, Liu Y et al (2021) IR-MSDNet: infrared and visible image fusion based on infrared features and multiscale dense network. IEEE J Sel Top Appl Earth Obs Remote Sens 14:3426–3437

    Article  Google Scholar 

  28. Ma J, Liang P, Yu W et al (2020) Infrared and visible image fusion via detail preserving adversarial learning. Inf Fusion 54:85–98

    Article  Google Scholar 

  29. Ma B, Zhu Y, Yin X et al (2021) SESF-Fuse: an unsupervised deep model for multi-focus image fusion. Neural Comput Appl 33:5793–5804

    Article  Google Scholar 

  30. Mustafa HT, Yang J, Zareapoor M (2019) Multi-scale convolutional neural network for multi-focus image fusion. Image Vis Comput 85:26–35

    Article  Google Scholar 

  31. Gai D, Shen X, Chen H, Su P (2020) Multi-focus image fusion method based on two stage of convolutional neural network. Signal Process 176:107681

    Article  Google Scholar 

  32. Wang Z, Chen B, Lu R et al (2020) FusionNet: an unsupervised convolutional variational network for hyperspectral and multispectral image fusion. IEEE Trans Image Process 29:7565–7577

    Article  MATH  Google Scholar 

  33. Liu Y, Chen X, Cheng J, Peng H (2017) A medical image fusion method based on convolutional neural networks. In: 2017 20th international conference on information fusion (fusion). IEEE, pp 1–7

  34. Liang X, Hu P, Zhang L et al (2019) MCFNet: multi-layer concatenation fusion network for medical images fusion. IEEE Sens J 19:7107–7119

    Article  Google Scholar 

  35. Zhang Y, Liu Y, Sun P et al (2020) IFCNN: a general image fusion framework based on convolutional neural network. Inf Fusion 54:99–118

    Article  Google Scholar 

  36. Xu H, Ma J, Jiang J et al (2022) U2Fusion: a unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44:502–518

    Article  Google Scholar 

  37. Fu J, Li W, Du J, Huang Y (2021) A multiscale residual pyramid attention network for medical image fusion. Biomed Signal Process Control 66:102488

    Article  Google Scholar 

  38. Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 2017:5999–6009

    Google Scholar 

  39. Dosovitskiy A, Beyer L, Kolesnikov A, et al (2021) An image is worth 16×16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  40. Yu C, Xiao B, Gao C, et al (2021) Lite-HRNet: a lightweight high-resolution network. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, pp 10435–10445

  41. Yuan L, Hou Q, Jiang Z, Feng J, Yan S (2021) VOLO: vision outlooker for visual recognition. In: Proceedings of the IEEE conference computer vision pattern recognition (CVPR). IEEE, arXiv preprint arXiv:2106.13112

  42. Liu F, Ren X, Zhang Z, et al (2020) Rethinking skip connection with layer normalization. In: Proceedings of the 28th international conference on computational linguistics. International Committee on Computational Linguistics, Stroudsburg, pp 3586–3598

  43. Hu J, Shen L, Albanie S et al (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42:2011–2023

    Article  Google Scholar 

  44. Aslantas V, Bendes E (2015) A new image quality metric for image fusion: the sum of the correlations of differences. AEU - Int J Electron Commun 69:1890–1896

    Article  Google Scholar 

  45. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13:600–612

    Article  Google Scholar 

  46. Xydeas CS, Petrović V (2000) Objective image fusion performance measure. Electron Lett 36:308

    Article  Google Scholar 

  47. Jung H, Kim Y, Jang H et al (2020) Unsupervised deep image fusion with structure tensor representations. IEEE Trans Image Process 29:3845–3858

    Article  MATH  Google Scholar 

  48. http://www.med.harvard.edu/AANLIB/

  49. Han Y, Cai Y, Cao Y, Xu X (2013) A new image fusion performance metric based on visual information fidelity. Inf Fusion 14:127–135

    Article  Google Scholar 

  50. Eskicioglu AM, Fisher PS (1995) Image quality measures and their performance. IEEE Trans Commun 43:2959–2965

    Article  Google Scholar 

  51. Othonos A (1997) Fiber bragg gratings. Rev Sci Instrum 68:4309–4341

    Article  Google Scholar 

  52. Petrovic V, Xydeas C (2005) Objective image fusion performance characterisation. In: Tenth IEEE international conference on computer vision (ICCV’05), vol 1. IEEE, pp 1866–1871

  53. Haghighat M, Razian MA (2014) Fast-FMI: non-reference image fusion metric. In: 2014 IEEE 8th international conference on application of information and communication technologies (AICT). IEEE, pp 1–3

  54. Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-CAM: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 618–626

  55. Yin M, Liu X, Liu Y, Chen X (2019) Medical image fusion with parameter-adaptive pulse coupled neural network in nonsubsampled shearlet transform domain. IEEE Trans Instrum Meas 68:49–64

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (NSFC) (Grant No.: 61871440). The authors also acknowledge the support by the medical ultrasound lab at Huazhong University of Science and Technology for providing GPU computation platform.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuming Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Q., Ye, S., Wen, M. et al. Multi-modal medical image fusion based on densely-connected high-resolution CNN and hybrid transformer. Neural Comput & Applic 34, 21741–21761 (2022). https://doi.org/10.1007/s00521-022-07635-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07635-1

Keywords

Navigation