Cross-modality complementary information fusion for multispectral pedestrian detection

Yan, Chaoqi; Zhang, Hong; Li, Xuliang; Yang, Yifan; Yuan, Ding

doi:10.1007/s00521-023-08239-z

Cross-modality complementary information fusion for multispectral pedestrian detection

Original Article
Published: 31 January 2023

Volume 35, pages 10361–10386, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Chaoqi Yan¹,
Hong Zhang¹,
Xuliang Li¹,
Yifan Yang² &
…
Ding Yuan ORCID: orcid.org/0000-0001-8107-7218¹

1060 Accesses
5 Citations
Explore all metrics

Abstract

Multispectral pedestrian detection has received increasing attention in recent years as color and thermal modalities can provide complementary visual information, especially under insufficient illumination conditions. However, there is still a persistent crucial problem that how to design the cross-modality fusion mechanism to fully exploit the complementary characteristics between different modalities. In this paper, we propose a novel cross-modality complementary information fusion network (denoted as CCIFNet) to comprehensively capture the long-range interactions with precise positional information and meanwhile preserve the inter-spatial relationship between different modalities in the feature extraction stage. Further, we design an adaptive illumination-aware weight generation module to adaptively weight the final detection confidence of color and thermal modalities by taking various illumination conditions into consideration. Specifically, we comprehensively compare three different fusion strategies about this module to synthetically explore the best way for generating the final illumination-aware fusion weights. Finally, we present a simple but effective feature alignment module to alleviate the position shift problem caused by the weakly aligned color-thermal image pairs. Extensive experiments and ablation studies on KAIST, CVC-14, FLIR and LLVIP multispectral object detection datasets show that the proposed CCIFNet can achieve state-of-the-art performance under different illumination evaluation settings, while keeping a competitive speed-accuracy trade-off for real-time applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CRetinex: A Progressive Color-Shift Aware Retinex Model for Low-Light Image Enhancement

Article 08 April 2024

Image Fusion Techniques: A Survey

Article 24 January 2021

SCA-YOLO: a new small object detection model for UAV images

Article 25 May 2023

Availability of data and materials

The datasets analysed during the current study are available in the public resources:https://soonminhwang.github.io/rgbt-ped-detection/ ;http://adas.cvc.uab.es/elektra/enigma-portfolio/cvc-14-visible-fir-day-night-pedestrian-sequence-dataset/http://shorturl.at/ahAY4 https://bupt-ai-cz.github.io/LLVIP/

References

Berg A, Ahlberg J, Felsberg M (2015) A thermal infrared dataset for evaluation of short-term tracking methods. In: Swedish Symposium on image analysis
Hwang S, Park J, Kim N, Choi Y, So Kweon I (2015) Multispectral pedestrian detection: Benchmark dataset and baseline. In: Proceedings of the IEEE conference on computer vision and Pattern recognition, p 1037–1045
Zhou Y, Omar M (2009) Pixel-level fusion for infrared and visible acquisitions. Inter J Optomechatron 3(1):41–53
Article Google Scholar
He W, Feng W, Peng Y, Chen Q, Gu G, Miao Z (2015) Multi-level image fusion and enhancement for target detection. Optik 126(11–12):1203–1208
Article Google Scholar
Torresan H, Turgeon B, Ibarra-Castanedo C, Hebert P, Maldague XP (2004) Advanced surveillance systems: combining video and thermal imagery for pedestrian detection. In: Thermosense XXVI, vol 5405, p 506–515. International Society for Optics and Photonics
Liu J, Zhang S, Wang S, Metaxas DN (2016) Multispectral deep neural networks for pedestrian detection. arXiv preprint arXiv:1611.02644
Zhang L, Liu Z, Zhang S, Yang X, Qiao H, Huang K, Hussain A (2019) Cross-modality interactive attention network for multispectral pedestrian detection. Inform Fusion 50:20–29
Article Google Scholar
Konig D, Adam M, Jarvers C, Layher G, Neumann H, Teutsch M (2017) Fully convolutional region proposal networks for multispectral person detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, p 49–56
Zhang H, Fromont E, Lefèvre S, Avignon B (2021) Guided attentive feature fusion for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, p 72–80
Guan D, Cao Y, Yang J, Cao Y, Yang MY (2019) Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inform Fusion 50:148–157
Article Google Scholar
Li C, Song D, Tong R, Tang M (2019) Illumination-aware faster r-cnn for robust multispectral pedestrian detection. Pattern Recognit 85:161–171
Article Google Scholar
Zhang L, Liu Z, Zhu X, Song Z, Yang X, Lei Z, Qiao H (2021) Weakly aligned feature fusion for multimodal object detection. IEEE Transact Neural Netw Learn Syst
Zhou K, Chen L, Cao X (2020) Improving multispectral pedestrian detection by addressing modality imbalance problems. In: European conference on computer vision, p 787–803. Springer
Kim J, Kim H, Kim T, Kim N, Choi Y (2021) Mlpd: Multi-label pedestrian detector in multispectral domain. IEEE Robot Automat Lett 6(4):7846–7853
Article Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, p 21–37. Springer
Wang Z-R, Jia Y-L, Huang H, Tang S-M (2008) Pedestrian detection using boosted hog features. In: 2008 11th International IEEE conference on intelligent transportation systems, p 1155–1160. IEEE
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of the British machine vision conference, p 1–11
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Transact pattern Anal Mach Intell 36(8):1532–1545
Article Google Scholar
Nam W, Dollár P, Han JH (2014) Local decorrelation for improved pedestrian detection. Adv Neural Inform Process Syst 424–432
Zhang S, Benenson R, Schiele B, et al (2015) Filtered channel features for pedestrian detection. In: CVPR, vol. 1, p 4
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Inter J Comput Vision 63(2):153–161
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inform Process Syst 91–99
Saeidi M, Ahmadi A (2021) High-performance and deep pedestrian detection based on estimation of different parts. J Supercomput 77(2):2033–2068
Article Google Scholar
Yan C, Zhang H, Li X, Yuan D (2022) R-ssd: refined single shot multibox detector for pedestrian detection. Appl Intell 52(9):10430–10447
Article Google Scholar
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European conference on computer vision (ECCV), p 637–653
Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast r-cnn for pedestrian detection. IEEE Transact Multimed 20(4):985–996
Google Scholar
Teutsch M, Muller T, Huber M, Beyerer J (2014) Low resolution person detection with a moving thermal infrared camera by hot spot classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, p 209–216
Biswas SK, Milanfar P (2017) Linear support tensor machine with lsk channels: Pedestrian detection in thermal infrared images. IEEE Transact Image Process 26(9):4229–4242
Article MATH MathSciNet Google Scholar
Chen Y, Shin H (2020) Pedestrian detection at night in infrared images using an attention-guided encoder-decoder convolutional neural network. Appl Sci 10(3):809
Article Google Scholar
Wagner J, Fischer V, Herman M, Behnke S, et al (2016) Multispectral pedestrian detection using deep fusion convolutional neural networks. In: ESANN, vol 587, p 509–514
Vandersteegen M, Beeck KV, Goedemé T (2018) Real-time multispectral pedestrian detection with a single-pass deep neural network. In: International conference image analysis and recognition, p 419–426. Springer
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p 7263–7271
Li C, Song D, Tong R, Tang M (2018) Multispectral pedestrian detection via simultaneous detection and segmentation. arXiv preprint arXiv:1808.04818
Cao Y, Guan D, Wu Y, Yang J, Cao Y, Yang MY (2019) Box-level segmentation supervised deep neural networks for accurate and real-time multispectral pedestrian detection. ISPRS J Photogramm Remote Sens 150:70–79
Article Google Scholar
Zheng Y, Izzat IH, Ziaee S (2019) Gfd-ssd: gated fusion double ssd for multispectral pedestrian detection. arXiv preprint arXiv:1903.06999
Qingyun F, Dapeng H, Zhaokui W (2021) Cross-modality fusion transformer for multispectral object detection. arXiv preprint arXiv:2111.00273
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p 1933–1941
Cheng Y, Cai R, Li Z, Zhao X, Huang K (2017) Locality-sensitive deconvolution networks with gated fusion for rgb-d indoor semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p 3029–3037
Liu H, Zhang J, Yang K, Hu X, Stiefelhagen R (2022) Cmx: Cross-modal fusion for rgb-x semantic segmentation with transformers. arXiv preprint arXiv:2203.04838
Zhou H, Tian C, Zhang Z, Huo Q, Xie Y, Li Z (2022) Multi-spectral fusion transformer network for rgb-thermal urban scene semantic segmentation. IEEE Geosci Remote Sensing Lett
Pei D, Jing M, Liu H, Sun F, Jiang L (2020) A fast retinanet fusion framework for multi-spectral pedestrian detection. Infra Phys Technol 105:103178
Article Google Scholar
Kim JU, Park S, Ro YM (2021) Uncertainty-guided cross-modal learning for robust multispectral pedestrian detection. IEEE Transact Circuits Syst Video Technol
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p 770–778
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Informa Process Syst 1106–1114
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on Computer Vision Pattern Recognition, p 7132–7141
Woo S, Park J, Lee J-Y, Kweon, IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the european conference on computer vision (ECCV), p 3–19
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, p 13713–13722
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, p 764–773
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, p 2980–2988
González A, Fang Z, Socarras Y, Serrat J, Vázquez D, Xu J, López AM (2016) Pedestrian detection at day/night time with visible and fir cameras: A comparison. Sensors 16(6):820
Article Google Scholar
Free flir thermal dataset for algorithm training. https://www.flir.com/oem/adas/adas-dataset-form/
Jia X, Zhu C, Li M, Tang W, Zhou W (2021) Llvip: A visible-infrared paired dataset for low-light vision. In: Proceedings of the IEEE/CVF international conference on computer vision, p 3496–3504
Zhang L, Zhu X, Chen X, Yang X, Lei Z, Liu Z (2019) Weakly aligned cross-modal learning for multispectral pedestrian detection. In: Proceedings of the IEEE/CVF international conference on computer vision, p 5127–5137
Zhang H, Fromont E, Lefevre S, Avignon B (2020) Multispectral fusion for object detection with cyclic fuse-and-refine blocks. In: 2020 IEEE International conference on image processing (ICIP), p 276–280. IEEE
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Transact Pattern Anal Mach Intell 34(4):743–761
Article Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, p 249–256. JMLR Workshop and Conference Proceedings
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014)Microsoft coco: Common objects in context. In: European conference on computer vision, p 740–755. Springer
Park K, Kim S, Sohn K (2018) Unified multi-spectral pedestrian detection based on probabilistic fusion networks. Pattern Recognit 80:143–155
Article Google Scholar
Choi H, Kim S, Park K, Sohn K (2016) Multi-spectral pedestrian detection based on accumulated object proposal with fully convolutional networks. In: 2016 23rd International conference on pattern recognition (ICPR), p 621–626. IEEE
Kieu M, Bagdanov AD, Bertini M (2021) Bottom-up and layerwise domain adaptation for pedestrian detection in thermal images. ACM Transact Multimed Comput Communicat Appl (TOMM). 17(1), 1–19
Devaguptapu C, Akolekar N, M Sharma M, N Balasubramanian V (2019) Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, p 0–0
Zuo X, Wang Z, Liu Y, Shen J, Wang H (2022) Lgadet: Light-weight anchor-free multispectral pedestrian detection with mixed local and global attention. Neural Process Lett 1–18
Qingyun F, Dapeng H, Zhaokui W (2021) Cross-modality fusion transformer for multispectral object detection. arXiv preprint arXiv:2111.00273

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61872019, 61972015, 62002005 and 62002008) and the high performance computing (HPC) resources at Beihang University.

Author information

Authors and Affiliations

Image Processing Center, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, People’s Republic of China
Chaoqi Yan, Hong Zhang, Xuliang Li & Ding Yuan
Institute of Artificial Intelligence, Beihang University, 37 Xueyuan Road, Haidian District, Beijing, 100191, People’s Republic of China
Yifan Yang

Authors

Chaoqi Yan
View author publications
You can also search for this author in PubMed Google Scholar
Hong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ding Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ding Yuan.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yan, C., Zhang, H., Li, X. et al. Cross-modality complementary information fusion for multispectral pedestrian detection. Neural Comput & Applic 35, 10361–10386 (2023). https://doi.org/10.1007/s00521-023-08239-z

Download citation

Received: 30 May 2022
Accepted: 06 January 2023
Published: 31 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00521-023-08239-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-modality complementary information fusion for multispectral pedestrian detection

Abstract

Access this article

Similar content being viewed by others

CRetinex: A Progressive Color-Shift Aware Retinex Model for Low-Light Image Enhancement

Image Fusion Techniques: A Survey

SCA-YOLO: a new small object detection model for UAV images

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cross-modality complementary information fusion for multispectral pedestrian detection

Abstract

Access this article

Similar content being viewed by others

CRetinex: A Progressive Color-Shift Aware Retinex Model for Low-Light Image Enhancement

Image Fusion Techniques: A Survey

SCA-YOLO: a new small object detection model for UAV images

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation