Skip to main content

Advertisement

Visible and thermal image fusion network with diffusion models for high-level visual tasks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Fusion technology enhances the performance of applications such as security, autonomous driving, military surveillance, medical imaging, and environmental monitoring by combining complementary information. The fusion of visible and thermal (RGB-T) images is critical for improving human observation and visual tasks. However, the training of most semantics-driven fusion algorithms combines segmentation and fusion tasks, thereby increasing the computational cost and underutilizing semantic information. Designing a cleaner fusion architecture to mine rich deep semantic features is the key to addressing this issue. A two-stage RGB-T image fusion network with diffusion models is proposed in this paper. In the first stage, the diffusion model is employed to extract multiscale features. This provided rich semantic features and texture edges for the fusion network. In the next stage, semantic feature enhancement module (SFEM) and detail feature enhancement module (DFEM) are proposed to improve the network’s ability to describe small details. An adaptive global-local attention mechanism (AGAM) is used to enhance the weights of key features related to visual tasks. Specifically, we benchmarked the proposed algorithm by creating a new tri-modal sensor driving scene dataset (TSDS), which includes 15234 sets of labeled images (visible, thermal, and polarization degree images). The semantic segmentation model trained on our fusion images achieved 78.41% accuracy, and the object detection model achieved 87.21% MAP. The experimental results indicate that our algorithm outperforms the state-of-the-art image fusion algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The TSDS dataset is available at: https://www.kaggle.com/datasets/katrinawood/tsds-dataset

References

  1. Sa L, Yu C, Hong Z et al (2023) A broader study of cross-domain few-shot object detection. Appl Intell 53(23):29465–29485

    Article  MATH  Google Scholar 

  2. Yu Y, Wang J, Pedrycz W et al (2024) Multi-level information fusion transformer with background filter for fine-grained image recognition. Appl Intell 54(17–18):8108–8119

    Article  MATH  Google Scholar 

  3. Wang C, Wu Z, Chen Y et al (2023) Improving 3-d zebrafish tracking with multiview data fusion and global association. IEEE Sens J 23(15):17245–17259. https://doi.org/10.1109/JSEN.2023.3288729

    Article  MATH  Google Scholar 

  4. Zong X, Xu Y, Ye Z et al (2023) Pedestrian detection based on channel feature fusion and enhanced semantic segmentation. Appl Intell 53(24):30203–30218

    Article  Google Scholar 

  5. Ren J, Shi M, Chen J et al (2022) Hyperspectral image classification using multi-level features fusion capsule network with a dense structure. Appl Intell 52(11):14162–14181

    Article  MATH  Google Scholar 

  6. Jakhar SP, Nandal A, Dhaka A et al (2024) Brain tumor detection with multi-scale fractal feature network and fractal residual learning. Appl Soft Comput 153:111284

    Article  MATH  Google Scholar 

  7. Yao Q, Zhang L, Zheng W et al (2023) Multi-scale se-residual network with transformer encoder for myocardial infarction classification. Appl Soft Comput 149:110919. https://doi.org/10.1016/j.asoc.2023.110919

    Article  MATH  Google Scholar 

  8. Ma J, Chen C, Li C et al (2016) Infrared and visible image fusion via gradient transfer and total variation minimization. Inf Fusion 31:100–109

    Article  MATH  Google Scholar 

  9. Fu Z, Wang X, Xu J et al (2016) Infrared and visible images fusion based on rpca and nsct. Infrared Physics & Technology 77:114–123

    Article  MATH  Google Scholar 

  10. Cvejic N, Lewis JJ, Bull DR et al (2006) Region-based multimodal image fusion using ica bases. In: ICIP. IEEE, pp 1801–1804

  11. Tang D, Xiong Q, Yin H et al (2022) A novel sparse representation based fusion approach for multi-focus images. Expert Syst Appl 197:116737

    Article  MATH  Google Scholar 

  12. Li X, Zhang X, Ding M (2019) A sum-modified-laplacian and sparse representation based multimodal medical image fusion in laplacian pyramid domain. Medical Biol Eng Comput 57(10):2265–2275

    Article  MATH  Google Scholar 

  13. Ma J, Zhou Z, Wang B et al (2017) Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Physics & Technology 82:8–17

    Article  MATH  Google Scholar 

  14. Liang P, Jiang J, Liu X, et al (2022) Fusion from decomposition: A self-supervised decomposition approach for image fusion. In: Avidan S, Brostow GJ, Cissé M, et al (eds) ECCV (18), Lecture Notes in Computer Science, vol 13678. Springer, pp 719–735

  15. Li H, Wu XJ (2019) Densefuse: A fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623

    Article  MathSciNet  MATH  Google Scholar 

  16. Ma J, Yu W, Liang P et al (2019) Fusiongan: A generative adversarial network for infrared and visible image fusion. Inf Fusion 48:11–26

    Article  MATH  Google Scholar 

  17. Yang Z, Chen Y, Le Z et al (2021) Ganfuse: a novel multi-exposure image fusion method based on generative adversarial networks. Neural Comput Appl 33(11):6133–6145

    Article  MATH  Google Scholar 

  18. Zhang Y, Liu Y, Sun P et al (2020) Ifcnn: A general image fusion framework based on convolutional neural network. Inf Fusion 54:99–118

    Article  MATH  Google Scholar 

  19. Zhang H, Ma J (2021) Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion. Int J Comput Vis 129(10):2761–2785

    Article  MATH  Google Scholar 

  20. Mou L, Zhou C, Zhao P et al (2021) Driver stress detection via multimodal fusion using attention-based cnn-lstm. Expert Syst Appl 173:114693

    Article  Google Scholar 

  21. Wang Z, Yang F, Sun J et al (2024) Aitfuse: Infrared and visible image fusion via adaptive interactive transformer learning. Knowl-Based Syst 299:111949

    Article  Google Scholar 

  22. Ma J, Tang L, Fan F et al (2022) Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE CAA J Autom Sinica 9(7):1200–1217

    Article  MATH  Google Scholar 

  23. Tang L, Yuan J, Ma J (2022) Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf Fusion 82:28–42

    Article  MATH  Google Scholar 

  24. Liu J, Li Y, Ma Y et al (2023) Two-layer multiple scenario optimization framework for integrated energy system based on optimal energy contribution ratio strategy. Energy 285:128673

    Article  MATH  Google Scholar 

  25. Tang L, Zhang H, Xu H et al (2023) Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity. Inf Fusion 99(C)

  26. Adegun A, Viriri S, Tapamo JR (2024) Automated classification of remote sensing satellite images using deep learning based vision transformer. Appl Intell 54(24):13018–13037

    Article  Google Scholar 

  27. Yue J, Fang L, Xia S et al (2023) Dif-fusion: Towards high color fidelity in infrared and visible image fusion with diffusion models

  28. Feng K, Ji J, Ni Q, et al (2023a) A novel vibration-based prog-nostic scheme for gear health management in surface wear pro-gression of the intelligent manufacturing system. Wear 522:204697

  29. Feng K, Ji J, Ni Q et al (2023b) A review of vibration-based gear wear monitoring and prediction techniques. Mech Syst Signal Process 182:109605

  30. Feng K, Ji J, Zhang Y et al (2023c) Digital twin-driven intelligent assessment of gear surface degradation. Mech Syst Signal Process 186:109896

  31. Xu H, Ma J, Jiang J et al (2022) U2fusion: A unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44(1):502–518

    Article  MATH  Google Scholar 

  32. Liu J, Fan X, Jiang J et al (2022) Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. IEEE Trans Circuits Syst Video Technol 32(1):105–119

    Article  MATH  Google Scholar 

  33. Xu H, Zhang H, Ma J (2021) Classification saliency-based rule for visible and infrared image fusion. IEEE Trans Computational Imaging 7:824–836

    Article  MathSciNet  MATH  Google Scholar 

  34. Jian L, Yang X, Liu Z et al (2021) Sedrfuse: A symmetric encoder-decoder with residual block network for infrared and visible image fusion. IEEE Trans Instrum Meas 70:1–15

    Article  MATH  Google Scholar 

  35. Tang L, Yuan J, Zhang H et al (2022) Piafusion: A progressive infrared and visible image fusion network based on illumination aware. Inf Fusion 83–84:79–92

    Article  MATH  Google Scholar 

  36. Dhariwal P, Nichol AQ (2021) Diffusion models beat gans on image synthesis. In: Ranzato M, Beygelzimer A, Dauphin YN, et al (eds) NeurIPS, pp 8780–8794

  37. Zhao Z, Bai H, Zhu Y et al (2023) Ddfm: Denoising diffusion model for multi-modality image fusion. In: ICCV. IEEE, pp 8048–8059

  38. Bar-Tal O, Yariv L, Lipman Y et al (2023) Multidiffusion: Fusing diffusion paths for controlled image generation. CoRR abs/2302.08113

  39. Li M, Pei R, Zheng T et al (2024) Fusiondiff: Multi-focus image fusion using denoising diffusion probabilistic models. Expert Syst Appl 238:121664

    Article  Google Scholar 

  40. Yue J, Fang L, Xia S et al (2023) Dif-fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models. Trans Img Proc 32:5705–5720

    Article  MATH  Google Scholar 

  41. Saharia C, Ho J, Chan W et al (2023) Image super-resolution via iterative refinement. IEEE Trans Pattern Anal Mach Intell 45(4):4713–4726

    MATH  Google Scholar 

  42. Ma H, Nie Y (2016) An edge fusion scheme for image denoising based on anisotropic diffusion models. J Vis Commun Image Represent 40:406–417

    Article  MATH  Google Scholar 

  43. Ma N, Zhang X, Sun J (2020) Funnel activation for visual recognition. CoRR abs/2007.11824

  44. Roy AG, Navab N, Wachinger C (2018) Concurrent spatial and channel squeeze & excitation in fully convolutional networks. CoRR abs/1803.02579

  45. Jia X, Zhu C, Li M, et al (2021) Llvip: A visible-infrared paired dataset for low-light vision. CoRR abs/2108.10831

  46. Liu J, Fan X, Huang Z et al (2022) Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5792–5801

  47. Toet A (2017) The tno multiband image data collection. Data Brief 15:249–251

    Article  MATH  Google Scholar 

  48. Zhang X, Ye P, Xiao G (2020) Vifb: A visible and infrared image fusion benchmark. CoRR abs/2002.03322

  49. Thakur N, Devi S (2011) A new method for color image quality assessment. International Journal of Computer Applications 15:10–17

    Article  MATH  Google Scholar 

  50. Mahmoudpour S, Kim M (2015) Chapter 10 - a study on the relationship between depth map quality and stereoscopic image quality using upsampled depth maps. In: Deligiannidis L, Arabnia HR (eds) Emerging Trends in Image Processing, Computer Vision and Pattern Recognition. Morgan Kaufmann, Boston, pp 149–160

    Chapter  MATH  Google Scholar 

  51. Wang Z, Bovik A, Sheikh H et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

  52. Zhao Z, Xu S, Zhang J et al (2022) Efficient and model-based infrared and visible image fusion via algorithm unrolling. IEEE Trans Circuits Syst Video Technol 32(3):1186–1196

    Article  MATH  Google Scholar 

  53. Zhao Z, Bai H, Zhu Y et al (2023) Ddfm: Denoising diffusion model for multi-modality image fusion. In: ICCV. IEEE, pp 8048–8059

  54. Peng C, Tian T, Chen C et al (2021) Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation. Neural Netw 137:188–199

    Article  MATH  Google Scholar 

  55. Cheng B, Girshick RB, Dollár P et al (2021) Boundary iou: Improving object-centric image segmentation evaluation. CoRR abs/2103.16562

  56. Redmon J, Divvala SK, Girshick RB et al (2015) You only look once: Unified, real-time object detection. 2016 IEEE Conference on computer vision and pattern recognition (CVPR) pp 779–788

Download references

Acknowledgements

This work is funded by International Cooperation Foundation of Jilin Province (20210402074GH) and Autonomous Vehicle and Optoelectronic Instrument Innovation Project of Zhongshan City (CXTD2023002).

Author information

Authors and Affiliations

Authors

Contributions

Jin Meng: Conceptualization, Methodology, Software, Supervision, Validation, Writing - original draft. Jiahui Zou: Data curation, Writing - original draft. Zhuoheng Xiang: Data curation, Writing - original draft. Cui Wang: Writing -review & editing, Supervision. Shifeng Wang: Supervision, Funding acquisition. Yan Li: Writing - review & editing. Jonghyuk Kim: Writing -review & editing.

Corresponding author

Correspondence to Shifeng Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests or personal relationships which might influence their work.

Ethics approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Meng, J., Zou, J., Xiang, Z. et al. Visible and thermal image fusion network with diffusion models for high-level visual tasks. Appl Intell 55, 286 (2025). https://doi.org/10.1007/s10489-024-06210-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-06210-6

Keywords