Abstract
Fusion technology enhances the performance of applications such as security, autonomous driving, military surveillance, medical imaging, and environmental monitoring by combining complementary information. The fusion of visible and thermal (RGB-T) images is critical for improving human observation and visual tasks. However, the training of most semantics-driven fusion algorithms combines segmentation and fusion tasks, thereby increasing the computational cost and underutilizing semantic information. Designing a cleaner fusion architecture to mine rich deep semantic features is the key to addressing this issue. A two-stage RGB-T image fusion network with diffusion models is proposed in this paper. In the first stage, the diffusion model is employed to extract multiscale features. This provided rich semantic features and texture edges for the fusion network. In the next stage, semantic feature enhancement module (SFEM) and detail feature enhancement module (DFEM) are proposed to improve the network’s ability to describe small details. An adaptive global-local attention mechanism (AGAM) is used to enhance the weights of key features related to visual tasks. Specifically, we benchmarked the proposed algorithm by creating a new tri-modal sensor driving scene dataset (TSDS), which includes 15234 sets of labeled images (visible, thermal, and polarization degree images). The semantic segmentation model trained on our fusion images achieved 78.41% accuracy, and the object detection model achieved 87.21% MAP. The experimental results indicate that our algorithm outperforms the state-of-the-art image fusion algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The TSDS dataset is available at: https://www.kaggle.com/datasets/katrinawood/tsds-dataset
References
Sa L, Yu C, Hong Z et al (2023) A broader study of cross-domain few-shot object detection. Appl Intell 53(23):29465–29485
Yu Y, Wang J, Pedrycz W et al (2024) Multi-level information fusion transformer with background filter for fine-grained image recognition. Appl Intell 54(17–18):8108–8119
Wang C, Wu Z, Chen Y et al (2023) Improving 3-d zebrafish tracking with multiview data fusion and global association. IEEE Sens J 23(15):17245–17259. https://doi.org/10.1109/JSEN.2023.3288729
Zong X, Xu Y, Ye Z et al (2023) Pedestrian detection based on channel feature fusion and enhanced semantic segmentation. Appl Intell 53(24):30203–30218
Ren J, Shi M, Chen J et al (2022) Hyperspectral image classification using multi-level features fusion capsule network with a dense structure. Appl Intell 52(11):14162–14181
Jakhar SP, Nandal A, Dhaka A et al (2024) Brain tumor detection with multi-scale fractal feature network and fractal residual learning. Appl Soft Comput 153:111284
Yao Q, Zhang L, Zheng W et al (2023) Multi-scale se-residual network with transformer encoder for myocardial infarction classification. Appl Soft Comput 149:110919. https://doi.org/10.1016/j.asoc.2023.110919
Ma J, Chen C, Li C et al (2016) Infrared and visible image fusion via gradient transfer and total variation minimization. Inf Fusion 31:100–109
Fu Z, Wang X, Xu J et al (2016) Infrared and visible images fusion based on rpca and nsct. Infrared Physics & Technology 77:114–123
Cvejic N, Lewis JJ, Bull DR et al (2006) Region-based multimodal image fusion using ica bases. In: ICIP. IEEE, pp 1801–1804
Tang D, Xiong Q, Yin H et al (2022) A novel sparse representation based fusion approach for multi-focus images. Expert Syst Appl 197:116737
Li X, Zhang X, Ding M (2019) A sum-modified-laplacian and sparse representation based multimodal medical image fusion in laplacian pyramid domain. Medical Biol Eng Comput 57(10):2265–2275
Ma J, Zhou Z, Wang B et al (2017) Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Physics & Technology 82:8–17
Liang P, Jiang J, Liu X, et al (2022) Fusion from decomposition: A self-supervised decomposition approach for image fusion. In: Avidan S, Brostow GJ, Cissé M, et al (eds) ECCV (18), Lecture Notes in Computer Science, vol 13678. Springer, pp 719–735
Li H, Wu XJ (2019) Densefuse: A fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623
Ma J, Yu W, Liang P et al (2019) Fusiongan: A generative adversarial network for infrared and visible image fusion. Inf Fusion 48:11–26
Yang Z, Chen Y, Le Z et al (2021) Ganfuse: a novel multi-exposure image fusion method based on generative adversarial networks. Neural Comput Appl 33(11):6133–6145
Zhang Y, Liu Y, Sun P et al (2020) Ifcnn: A general image fusion framework based on convolutional neural network. Inf Fusion 54:99–118
Zhang H, Ma J (2021) Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion. Int J Comput Vis 129(10):2761–2785
Mou L, Zhou C, Zhao P et al (2021) Driver stress detection via multimodal fusion using attention-based cnn-lstm. Expert Syst Appl 173:114693
Wang Z, Yang F, Sun J et al (2024) Aitfuse: Infrared and visible image fusion via adaptive interactive transformer learning. Knowl-Based Syst 299:111949
Ma J, Tang L, Fan F et al (2022) Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE CAA J Autom Sinica 9(7):1200–1217
Tang L, Yuan J, Ma J (2022) Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf Fusion 82:28–42
Liu J, Li Y, Ma Y et al (2023) Two-layer multiple scenario optimization framework for integrated energy system based on optimal energy contribution ratio strategy. Energy 285:128673
Tang L, Zhang H, Xu H et al (2023) Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity. Inf Fusion 99(C)
Adegun A, Viriri S, Tapamo JR (2024) Automated classification of remote sensing satellite images using deep learning based vision transformer. Appl Intell 54(24):13018–13037
Yue J, Fang L, Xia S et al (2023) Dif-fusion: Towards high color fidelity in infrared and visible image fusion with diffusion models
Feng K, Ji J, Ni Q, et al (2023a) A novel vibration-based prog-nostic scheme for gear health management in surface wear pro-gression of the intelligent manufacturing system. Wear 522:204697
Feng K, Ji J, Ni Q et al (2023b) A review of vibration-based gear wear monitoring and prediction techniques. Mech Syst Signal Process 182:109605
Feng K, Ji J, Zhang Y et al (2023c) Digital twin-driven intelligent assessment of gear surface degradation. Mech Syst Signal Process 186:109896
Xu H, Ma J, Jiang J et al (2022) U2fusion: A unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44(1):502–518
Liu J, Fan X, Jiang J et al (2022) Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. IEEE Trans Circuits Syst Video Technol 32(1):105–119
Xu H, Zhang H, Ma J (2021) Classification saliency-based rule for visible and infrared image fusion. IEEE Trans Computational Imaging 7:824–836
Jian L, Yang X, Liu Z et al (2021) Sedrfuse: A symmetric encoder-decoder with residual block network for infrared and visible image fusion. IEEE Trans Instrum Meas 70:1–15
Tang L, Yuan J, Zhang H et al (2022) Piafusion: A progressive infrared and visible image fusion network based on illumination aware. Inf Fusion 83–84:79–92
Dhariwal P, Nichol AQ (2021) Diffusion models beat gans on image synthesis. In: Ranzato M, Beygelzimer A, Dauphin YN, et al (eds) NeurIPS, pp 8780–8794
Zhao Z, Bai H, Zhu Y et al (2023) Ddfm: Denoising diffusion model for multi-modality image fusion. In: ICCV. IEEE, pp 8048–8059
Bar-Tal O, Yariv L, Lipman Y et al (2023) Multidiffusion: Fusing diffusion paths for controlled image generation. CoRR abs/2302.08113
Li M, Pei R, Zheng T et al (2024) Fusiondiff: Multi-focus image fusion using denoising diffusion probabilistic models. Expert Syst Appl 238:121664
Yue J, Fang L, Xia S et al (2023) Dif-fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models. Trans Img Proc 32:5705–5720
Saharia C, Ho J, Chan W et al (2023) Image super-resolution via iterative refinement. IEEE Trans Pattern Anal Mach Intell 45(4):4713–4726
Ma H, Nie Y (2016) An edge fusion scheme for image denoising based on anisotropic diffusion models. J Vis Commun Image Represent 40:406–417
Ma N, Zhang X, Sun J (2020) Funnel activation for visual recognition. CoRR abs/2007.11824
Roy AG, Navab N, Wachinger C (2018) Concurrent spatial and channel squeeze & excitation in fully convolutional networks. CoRR abs/1803.02579
Jia X, Zhu C, Li M, et al (2021) Llvip: A visible-infrared paired dataset for low-light vision. CoRR abs/2108.10831
Liu J, Fan X, Huang Z et al (2022) Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5792–5801
Toet A (2017) The tno multiband image data collection. Data Brief 15:249–251
Zhang X, Ye P, Xiao G (2020) Vifb: A visible and infrared image fusion benchmark. CoRR abs/2002.03322
Thakur N, Devi S (2011) A new method for color image quality assessment. International Journal of Computer Applications 15:10–17
Mahmoudpour S, Kim M (2015) Chapter 10 - a study on the relationship between depth map quality and stereoscopic image quality using upsampled depth maps. In: Deligiannidis L, Arabnia HR (eds) Emerging Trends in Image Processing, Computer Vision and Pattern Recognition. Morgan Kaufmann, Boston, pp 149–160
Wang Z, Bovik A, Sheikh H et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Zhao Z, Xu S, Zhang J et al (2022) Efficient and model-based infrared and visible image fusion via algorithm unrolling. IEEE Trans Circuits Syst Video Technol 32(3):1186–1196
Zhao Z, Bai H, Zhu Y et al (2023) Ddfm: Denoising diffusion model for multi-modality image fusion. In: ICCV. IEEE, pp 8048–8059
Peng C, Tian T, Chen C et al (2021) Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation. Neural Netw 137:188–199
Cheng B, Girshick RB, Dollár P et al (2021) Boundary iou: Improving object-centric image segmentation evaluation. CoRR abs/2103.16562
Redmon J, Divvala SK, Girshick RB et al (2015) You only look once: Unified, real-time object detection. 2016 IEEE Conference on computer vision and pattern recognition (CVPR) pp 779–788
Acknowledgements
This work is funded by International Cooperation Foundation of Jilin Province (20210402074GH) and Autonomous Vehicle and Optoelectronic Instrument Innovation Project of Zhongshan City (CXTD2023002).
Author information
Authors and Affiliations
Contributions
Jin Meng: Conceptualization, Methodology, Software, Supervision, Validation, Writing - original draft. Jiahui Zou: Data curation, Writing - original draft. Zhuoheng Xiang: Data curation, Writing - original draft. Cui Wang: Writing -review & editing, Supervision. Shifeng Wang: Supervision, Funding acquisition. Yan Li: Writing - review & editing. Jonghyuk Kim: Writing -review & editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests or personal relationships which might influence their work.
Ethics approval
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Meng, J., Zou, J., Xiang, Z. et al. Visible and thermal image fusion network with diffusion models for high-level visual tasks. Appl Intell 55, 286 (2025). https://doi.org/10.1007/s10489-024-06210-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-024-06210-6