Visible and thermal image fusion network with diffusion models for high-level visual tasks

Meng, Jin; Zou, Jiahui; Xiang, Zhuoheng; Wang, Cui; Wang, Shifeng; Li, Yan; Kim, Jonghyuk

doi:10.1007/s10489-024-06210-6

Visible and thermal image fusion network with diffusion models for high-level visual tasks

Published: 09 January 2025

Volume 55, article number 286, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Jin Meng^1,2,
Jiahui Zou^1,2,
Zhuoheng Xiang^1,2,
Cui Wang^1,2,
Shifeng Wang ORCID: orcid.org/0000-0002-1626-4698^1,2,
Yan Li³ &
…
Jonghyuk Kim⁴

176 Accesses
Explore all metrics

Abstract

Fusion technology enhances the performance of applications such as security, autonomous driving, military surveillance, medical imaging, and environmental monitoring by combining complementary information. The fusion of visible and thermal (RGB-T) images is critical for improving human observation and visual tasks. However, the training of most semantics-driven fusion algorithms combines segmentation and fusion tasks, thereby increasing the computational cost and underutilizing semantic information. Designing a cleaner fusion architecture to mine rich deep semantic features is the key to addressing this issue. A two-stage RGB-T image fusion network with diffusion models is proposed in this paper. In the first stage, the diffusion model is employed to extract multiscale features. This provided rich semantic features and texture edges for the fusion network. In the next stage, semantic feature enhancement module (SFEM) and detail feature enhancement module (DFEM) are proposed to improve the network’s ability to describe small details. An adaptive global-local attention mechanism (AGAM) is used to enhance the weights of key features related to visual tasks. Specifically, we benchmarked the proposed algorithm by creating a new tri-modal sensor driving scene dataset (TSDS), which includes 15234 sets of labeled images (visible, thermal, and polarization degree images). The semantic segmentation model trained on our fusion images achieved 78.41% accuracy, and the object detection model achieved 87.21% MAP. The experimental results indicate that our algorithm outperforms the state-of-the-art image fusion algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Adaptive low light visual enhancement and high-significant target detection for infrared and visible image fusion

Article 16 January 2023

SIE: infrared and visible image fusion based on scene information embedding

Article 15 April 2024

SIEFusion: Infrared and Visible Image Fusion via Semantic Information Enhancement

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The TSDS dataset is available at: https://www.kaggle.com/datasets/katrinawood/tsds-dataset

References

Sa L, Yu C, Hong Z et al (2023) A broader study of cross-domain few-shot object detection. Appl Intell 53(23):29465–29485
Article MATH Google Scholar
Yu Y, Wang J, Pedrycz W et al (2024) Multi-level information fusion transformer with background filter for fine-grained image recognition. Appl Intell 54(17–18):8108–8119
Article MATH Google Scholar
Wang C, Wu Z, Chen Y et al (2023) Improving 3-d zebrafish tracking with multiview data fusion and global association. IEEE Sens J 23(15):17245–17259. https://doi.org/10.1109/JSEN.2023.3288729
Article MATH Google Scholar
Zong X, Xu Y, Ye Z et al (2023) Pedestrian detection based on channel feature fusion and enhanced semantic segmentation. Appl Intell 53(24):30203–30218
Article Google Scholar
Ren J, Shi M, Chen J et al (2022) Hyperspectral image classification using multi-level features fusion capsule network with a dense structure. Appl Intell 52(11):14162–14181
Article MATH Google Scholar
Jakhar SP, Nandal A, Dhaka A et al (2024) Brain tumor detection with multi-scale fractal feature network and fractal residual learning. Appl Soft Comput 153:111284
Article MATH Google Scholar
Yao Q, Zhang L, Zheng W et al (2023) Multi-scale se-residual network with transformer encoder for myocardial infarction classification. Appl Soft Comput 149:110919. https://doi.org/10.1016/j.asoc.2023.110919
Article MATH Google Scholar
Ma J, Chen C, Li C et al (2016) Infrared and visible image fusion via gradient transfer and total variation minimization. Inf Fusion 31:100–109
Article MATH Google Scholar
Fu Z, Wang X, Xu J et al (2016) Infrared and visible images fusion based on rpca and nsct. Infrared Physics & Technology 77:114–123
Article MATH Google Scholar
Cvejic N, Lewis JJ, Bull DR et al (2006) Region-based multimodal image fusion using ica bases. In: ICIP. IEEE, pp 1801–1804
Tang D, Xiong Q, Yin H et al (2022) A novel sparse representation based fusion approach for multi-focus images. Expert Syst Appl 197:116737
Article MATH Google Scholar
Li X, Zhang X, Ding M (2019) A sum-modified-laplacian and sparse representation based multimodal medical image fusion in laplacian pyramid domain. Medical Biol Eng Comput 57(10):2265–2275
Article MATH Google Scholar
Ma J, Zhou Z, Wang B et al (2017) Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Physics & Technology 82:8–17
Article MATH Google Scholar
Liang P, Jiang J, Liu X, et al (2022) Fusion from decomposition: A self-supervised decomposition approach for image fusion. In: Avidan S, Brostow GJ, Cissé M, et al (eds) ECCV (18), Lecture Notes in Computer Science, vol 13678. Springer, pp 719–735
Li H, Wu XJ (2019) Densefuse: A fusion approach to infrared and visible images. IEEE Trans Image Process 28(5):2614–2623
Article MathSciNet MATH Google Scholar
Ma J, Yu W, Liang P et al (2019) Fusiongan: A generative adversarial network for infrared and visible image fusion. Inf Fusion 48:11–26
Article MATH Google Scholar
Yang Z, Chen Y, Le Z et al (2021) Ganfuse: a novel multi-exposure image fusion method based on generative adversarial networks. Neural Comput Appl 33(11):6133–6145
Article MATH Google Scholar
Zhang Y, Liu Y, Sun P et al (2020) Ifcnn: A general image fusion framework based on convolutional neural network. Inf Fusion 54:99–118
Article MATH Google Scholar
Zhang H, Ma J (2021) Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion. Int J Comput Vis 129(10):2761–2785
Article MATH Google Scholar
Mou L, Zhou C, Zhao P et al (2021) Driver stress detection via multimodal fusion using attention-based cnn-lstm. Expert Syst Appl 173:114693
Article Google Scholar
Wang Z, Yang F, Sun J et al (2024) Aitfuse: Infrared and visible image fusion via adaptive interactive transformer learning. Knowl-Based Syst 299:111949
Article Google Scholar
Ma J, Tang L, Fan F et al (2022) Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE CAA J Autom Sinica 9(7):1200–1217
Article MATH Google Scholar
Tang L, Yuan J, Ma J (2022) Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf Fusion 82:28–42
Article MATH Google Scholar
Liu J, Li Y, Ma Y et al (2023) Two-layer multiple scenario optimization framework for integrated energy system based on optimal energy contribution ratio strategy. Energy 285:128673
Article MATH Google Scholar
Tang L, Zhang H, Xu H et al (2023) Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity. Inf Fusion 99(C)
Adegun A, Viriri S, Tapamo JR (2024) Automated classification of remote sensing satellite images using deep learning based vision transformer. Appl Intell 54(24):13018–13037
Article Google Scholar
Yue J, Fang L, Xia S et al (2023) Dif-fusion: Towards high color fidelity in infrared and visible image fusion with diffusion models
Feng K, Ji J, Ni Q, et al (2023a) A novel vibration-based prog-nostic scheme for gear health management in surface wear pro-gression of the intelligent manufacturing system. Wear 522:204697
Feng K, Ji J, Ni Q et al (2023b) A review of vibration-based gear wear monitoring and prediction techniques. Mech Syst Signal Process 182:109605
Feng K, Ji J, Zhang Y et al (2023c) Digital twin-driven intelligent assessment of gear surface degradation. Mech Syst Signal Process 186:109896
Xu H, Ma J, Jiang J et al (2022) U2fusion: A unified unsupervised image fusion network. IEEE Trans Pattern Anal Mach Intell 44(1):502–518
Article MATH Google Scholar
Liu J, Fan X, Jiang J et al (2022) Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. IEEE Trans Circuits Syst Video Technol 32(1):105–119
Article MATH Google Scholar
Xu H, Zhang H, Ma J (2021) Classification saliency-based rule for visible and infrared image fusion. IEEE Trans Computational Imaging 7:824–836
Article MathSciNet MATH Google Scholar
Jian L, Yang X, Liu Z et al (2021) Sedrfuse: A symmetric encoder-decoder with residual block network for infrared and visible image fusion. IEEE Trans Instrum Meas 70:1–15
Article MATH Google Scholar
Tang L, Yuan J, Zhang H et al (2022) Piafusion: A progressive infrared and visible image fusion network based on illumination aware. Inf Fusion 83–84:79–92
Article MATH Google Scholar
Dhariwal P, Nichol AQ (2021) Diffusion models beat gans on image synthesis. In: Ranzato M, Beygelzimer A, Dauphin YN, et al (eds) NeurIPS, pp 8780–8794
Zhao Z, Bai H, Zhu Y et al (2023) Ddfm: Denoising diffusion model for multi-modality image fusion. In: ICCV. IEEE, pp 8048–8059
Bar-Tal O, Yariv L, Lipman Y et al (2023) Multidiffusion: Fusing diffusion paths for controlled image generation. CoRR abs/2302.08113
Li M, Pei R, Zheng T et al (2024) Fusiondiff: Multi-focus image fusion using denoising diffusion probabilistic models. Expert Syst Appl 238:121664
Article Google Scholar
Yue J, Fang L, Xia S et al (2023) Dif-fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models. Trans Img Proc 32:5705–5720
Article MATH Google Scholar
Saharia C, Ho J, Chan W et al (2023) Image super-resolution via iterative refinement. IEEE Trans Pattern Anal Mach Intell 45(4):4713–4726
MATH Google Scholar
Ma H, Nie Y (2016) An edge fusion scheme for image denoising based on anisotropic diffusion models. J Vis Commun Image Represent 40:406–417
Article MATH Google Scholar
Ma N, Zhang X, Sun J (2020) Funnel activation for visual recognition. CoRR abs/2007.11824
Roy AG, Navab N, Wachinger C (2018) Concurrent spatial and channel squeeze & excitation in fully convolutional networks. CoRR abs/1803.02579
Jia X, Zhu C, Li M, et al (2021) Llvip: A visible-infrared paired dataset for low-light vision. CoRR abs/2108.10831
Liu J, Fan X, Huang Z et al (2022) Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5792–5801
Toet A (2017) The tno multiband image data collection. Data Brief 15:249–251
Article MATH Google Scholar
Zhang X, Ye P, Xiao G (2020) Vifb: A visible and infrared image fusion benchmark. CoRR abs/2002.03322
Thakur N, Devi S (2011) A new method for color image quality assessment. International Journal of Computer Applications 15:10–17
Article MATH Google Scholar
Mahmoudpour S, Kim M (2015) Chapter 10 - a study on the relationship between depth map quality and stereoscopic image quality using upsampled depth maps. In: Deligiannidis L, Arabnia HR (eds) Emerging Trends in Image Processing, Computer Vision and Pattern Recognition. Morgan Kaufmann, Boston, pp 149–160
Chapter MATH Google Scholar
Wang Z, Bovik A, Sheikh H et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Zhao Z, Xu S, Zhang J et al (2022) Efficient and model-based infrared and visible image fusion via algorithm unrolling. IEEE Trans Circuits Syst Video Technol 32(3):1186–1196
Article MATH Google Scholar
Zhao Z, Bai H, Zhu Y et al (2023) Ddfm: Denoising diffusion model for multi-modality image fusion. In: ICCV. IEEE, pp 8048–8059
Peng C, Tian T, Chen C et al (2021) Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation. Neural Netw 137:188–199
Article MATH Google Scholar
Cheng B, Girshick RB, Dollár P et al (2021) Boundary iou: Improving object-centric image segmentation evaluation. CoRR abs/2103.16562
Redmon J, Divvala SK, Girshick RB et al (2015) You only look once: Unified, real-time object detection. 2016 IEEE Conference on computer vision and pattern recognition (CVPR) pp 779–788

Download references

Acknowledgements

This work is funded by International Cooperation Foundation of Jilin Province (20210402074GH) and Autonomous Vehicle and Optoelectronic Instrument Innovation Project of Zhongshan City (CXTD2023002).

Author information

Authors and Affiliations

School of Optoelectronic Engineering, Changchun University of Science and Technology, Changchun, 130013, China
Jin Meng, Jiahui Zou, Zhuoheng Xiang, Cui Wang & Shifeng Wang
Zhongshan Institute of Changchun University of Science and Technology, Zhongshan, 528437, China
Jin Meng, Jiahui Zou, Zhuoheng Xiang, Cui Wang & Shifeng Wang
School of Computing, Macquarie University, 123 Pitt St, Sydney, NSW, 2109, Australia
Yan Li
Naif Arab University for Security Sciences, Kingdom of Saudi Arabia, Riyadh, 11452, Saudi Arabia
Jonghyuk Kim

Authors

Jin Meng
View author publications
You can also search for this author in PubMed Google Scholar
Jiahui Zou
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoheng Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Cui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shifeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jonghyuk Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jin Meng: Conceptualization, Methodology, Software, Supervision, Validation, Writing - original draft. Jiahui Zou: Data curation, Writing - original draft. Zhuoheng Xiang: Data curation, Writing - original draft. Cui Wang: Writing -review & editing, Supervision. Shifeng Wang: Supervision, Funding acquisition. Yan Li: Writing - review & editing. Jonghyuk Kim: Writing -review & editing.

Corresponding author

Correspondence to Shifeng Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests or personal relationships which might influence their work.

Ethics approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Meng, J., Zou, J., Xiang, Z. et al. Visible and thermal image fusion network with diffusion models for high-level visual tasks. Appl Intell 55, 286 (2025). https://doi.org/10.1007/s10489-024-06210-6

Download citation

Accepted: 15 December 2024
Published: 09 January 2025
DOI: https://doi.org/10.1007/s10489-024-06210-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visible and thermal image fusion network with diffusion models for high-level visual tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive low light visual enhancement and high-significant target detection for infrared and visible image fusion

SIE: infrared and visible image fusion based on scene information embedding

SIEFusion: Infrared and Visible Image Fusion via Semantic Information Enhancement

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Visible and thermal image fusion network with diffusion models for high-level visual tasks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adaptive low light visual enhancement and high-significant target detection for infrared and visible image fusion

SIE: infrared and visible image fusion based on scene information embedding

SIEFusion: Infrared and Visible Image Fusion via Semantic Information Enhancement

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation