skip to main content
10.1145/3581783.3613444acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Attentive Alignment Network for Multispectral Pedestrian Detection

Published: 27 October 2023 Publication History

Abstract

Multispectral pedestrian detection is of great importance in various around-the-clock applications, i.e., self-driving and video surveillance. Fusing the features from RGB images and thermal infrared (TIR) images to explore the complementary information between different modalities is one of the most effective manners to improve multispectral pedestrian detection performance. However, the misalignment between different modalities in spatial dimension and modality reliability would introduce harmful information during feature fusion, limiting the performance of multispectral pedestrian detection. To address the above issues, we propose an attentive alignment network, consisting of an attentive position alignment (APA) module and an attentive modality alignment (AMA) module. Our APA module emphasizes pedestrian regions while aligning the pedestrian regions between different modalities. Our AMA module utilizes a channel-wise attention mechanism with illumination guidance to eliminate the imbalance between different modalities. The experiments are conducted on two widely used multispectral detection datasets, KASIT and CVC-14. Our approach surpasses the current state-of-the-art performance on both datasets.

References

[1]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade r-cnn: delving into high quality object detection. In IEEE Conf. Comput. Vis. Pattern Recog. 6154--6162.
[2]
Jiale Cao, Yanwei Pang, Jin Xie, Fahad Shahbaz Khan, and Ling Shao. 2022. From Handcrafted to Deep Features for Pedestrian Detection: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 9 (2022), 4913--4934.
[3]
Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. 2016. R-fcn: Object detection via region-based fully convolutional networks. In Adv. Neural Inform. Process. Syst. 379--387.
[4]
Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, and Yichen Wei. 2017. Deformable Convolutional Networks. In Int. Conf. Comput. Vis. 764--773.
[5]
Kinjal Dasgupta, Arindam Das, Sudip Das, Ujjwal Bhattacharya, and Senthil Yogamani. 2022. Spatio-Contextual Deep Network-Based Multimodal Pedestrian Detection for Autonomous Driving. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 9 (2022), 15940--15950.
[6]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Conf. Comput. Vis. Pattern Recog.
[7]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Int. Conf. Learn. Represent.
[8]
Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, and Qi Tian. 2019. Centernet: keypoint triplets for object detection. In Int. Conf. Comput. Vis. 6569--6578.
[9]
Ross Girshick. 2015. Fast r-cnn. In Int. Conf. Comput. Vis. 1440--1448.
[10]
Alejandro González, Zhijie Fang, Yainuvis Socarras, Joan Serrat, David Vazquez, Jiaolong Xu, and Antonio M. López. 2016. Pedestrian Detection at Day/Night Time with Visible and FIR Cameras: A Comparison. Sensors (2016).
[11]
Dayan Guan, Yanpeng Cao, Jiangxin Yang, Yanlong Cao, and Michael Ying Yang. 2019. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Information Fusion, Vol. 50 (2019), 148--157.
[12]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Int. Conf. Comput. Vis. 2961--2969.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conf. Comput. Vis. Pattern Recog. 770--778.
[14]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In IEEE Conf. Comput. Vis. Pattern Recog. 2261--2269.
[15]
Soonmin Hwang, Jaesik Park, Namil Kim, Yukyung Choi, and In So Kweon. 2015. Multispectral pedestrian detection: Benchmark dataset and baseline. In IEEE Conf. Comput. Vis. Pattern Recog.
[16]
Jung Uk Kim, Sungjune Park, and Yong Man Ro. 2022. Uncertainty-Guided Cross-Modal Learning for Robust Multispectral Pedestrian Detection. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 3 (2022), 1510--1523.
[17]
Hei Law and Jia Deng. 2018. Cornernet: detecting objects as paired keypoints. In Eur. Conf. Comput. Vis. 765--781.
[18]
Chengyang Li, Dan Song, Ruofeng Tong, and Min Tang. 2018. Multispectral Pedestrian Detection via Simultaneous Detection and Segmentation. In Brit. Mach. Vis. Conf.
[19]
Chengyang Li, Dan Song, Ruofeng Tong, and Min Tang. 2019. Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognition, Vol. 85 (2019), 161--171.
[20]
Chunze Lin, Jiwen Lu, Gang Wang, and Jie Zhou. 2018. Graininess-aware deep feature learning for pedestrian detection. In Proceedings of the European conference on computer vision (ECCV). 732--747.
[21]
SMatthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, Junjie Yan, Wanli Ouyang, and Zhidong Deng. 2020. Detr for pedestrian detection. arXiv:2012.06785 (2020).
[22]
Tsung-Yi Lin, Priya Goyal, Ross B. Girshick, Kaiming He, and Piotr Dollár. 2017b. Focal Loss for Dense Object Detection. In Int. Conf. Comput. Vis. IEEE Computer Society, 2999--3007.
[23]
Tsung-Yi Lin, Piotr Dollár, Ross B Girshick, Kaiming He, Bharath Hariharan, and Serge J Belongie. 2017a. Feature Pyramid Networks for Object Detection. In IEEE Conf. Comput. Vis. Pattern Recog.
[24]
Jingjing Liu, Shaoting Zhang, Shu Wang, and Dimitris N. Metaxas. 2016. Multispectral Deep Neural Networks for Pedestrian Detection. In Brit. Mach. Vis. Conf.
[25]
Tianshan Liu, Kin-Man Lam, Rui Zhao, and Guoping Qiu. 2022. Deep Cross-Modal Representation Learning and Distillation for Illumination-Invariant Pedestrian Detection. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 1 (2022), 315--329.
[26]
Wei Liu, Shengcai Liao, Weidong Hu, Xuezhi Liang, and Xiao Chen. 2018. Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In Proceedings of the European Conference on Computer Vision (ECCV). 618--634.
[27]
Wei Liu, Shengcai Liao, Weiqiang Ren, Weidong Hu, and Yinan Yu. 2019. High-level semantic feature detection: a new perspective for pedestrian detection. In IEEE Conf. Comput. Vis. Pattern Recog. 5187--5196.
[28]
Jiayuan Mao, Tete Xiao, Yuning Jiang, and Zhimin Cao. 2017. What can help pedestrian detection?. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3127--3136.
[29]
Jing Nie, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, and Ling Shao. 2019. Enriched feature guided refinement network for object detection. In Int. Conf. Comput. Vis. 9537--9546.
[30]
Jing Nie, Yanwei Pang, Shengjie Zhao, Jungong Han, and Xuelong Li. 2020. Efficient selective context network for accurate object detection. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 31, 9 (2020), 3456--3468.
[31]
Yanwei Pang, Jiale Cao, Yazhao Li, Jin Xie, Hanqing Sun, and Jinfeng Gong. 2021. TJU-DHD: A Diverse High-Resolution Dataset for Object Detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 207--219.
[32]
Yanwei Pang, Jin Xie, Muhammad Haris Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, and Ling Shao. 2019. Mask-guided attention network for occluded pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). 4966--4974.
[33]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Adv. Neural Inform. Process. Syst.
[34]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[35]
Tao Song, Leiyu Sun, Di Xie, Haiming Sun, and Shiliang Pu. 2018. Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In Eur. Conf. Comput. Vis. 536--551.
[36]
C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In IEEE Conf. Comput. Vis. Pattern Recog. 1--9.
[37]
Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: fully convolutional one-stage object detection. In Int. Conf. Comput. Vis. 9627--9636.
[38]
Jin Xie, Rao Muhammad Anwer, Hisham Cholakkal, Jing Nie, Jiale Cao, Jorma Laaksonen, and Fahad Shahbaz Khan. 2022. Learning a Dynamic Cross-Modal Network for Multispectral Pedestrian Detection. In Proceedings of the 30th ACM International Conference on Multimedia. 4043--4052.
[39]
Jin Xie, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao, and Mubarak Shah. 2020. Count-and similarity-aware R-CNN for pedestrian detection. In Eur. Conf. Comput. Vis. 88--104.
[40]
Jin Xie, Yanwei Pang, Muhammad Haris Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, and Ling Shao. 2021. Mask-guided attention network and occlusion-sensitive hard example mining for occluded pedestrian detection. IEEE Transactions on Image Processing, Vol. 30 (2021), 3872--3884.
[41]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In IEEE Conf. Comput. Vis. Pattern Recog. 5987--5995.
[42]
Xiaoxiao Yang, Yeqiang Qian, Huijie Zhu, Chunxiang Wang, and Ming Yang. 2022. BAANet: Learning Bi-directional Adaptive Attention Gates for Multispectral Pedestrian Detection. In International Conference on Robotics and Automation (ICRA). 2920--2926.
[43]
Heng Zhang, Elisa Fromont, Sebastie Lefevre, and Bruno Avignon. 2021. Guided attentive feature fusion for multispectral pedestrian detection. In IEEE Winter Conference on Applications of Computer Vision.
[44]
Jialiang Zhang, Lixiang Lin, Yang Li, Yun chen Chen, Jianke Zhu, Yao Hu, and Steven C.H. Hoi. 2019a. Attribute-aware pedestrian detection in a crowd. arXiv:1912.08661 (2019).
[45]
Lu Zhang, Zhiyong Liu, Shifeng Zhang, Xu Yang, Hong Qiao, Kaizhu Huang, and Amir Hussain. 2019b. Cross-modality interactive attention network for multispectral pedestrian detection. Information Fusion, Vol. 50 (2019), 20--29.
[46]
Lu Zhang, Xiangyu Zhu, Xiangyu Chen, Xu Yang, Zhen Lei, and Zhiyong Liu. 2019c. Weakly Aligned Cross-Modal Learning for Multispectral Pedestrian Detection. In international conference on computer vision.
[47]
Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. 2017. Citypersons: A diverse dataset for pedestrian detection. In IEEE Conf. Comput. Vis. Pattern Recog. 4457--4465.
[48]
Shanshan Zhang, Jian Yang, and Bernt Schiele. 2018. Occluded pedestrian detection through guided attention in cnns. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 6995--7003.
[49]
Chunluan Zhou and Junsong Yuan. 2018. Bi-box regression for pedestrian detection and occlusion estimation. In Proceedings of the European Conference on Computer Vision (ECCV). 135--151.
[50]
Kailai Zhou, Linsen Chen, and Xun Cao. 2020. Improving Multispectral Pedestrian Detection by Addressing Modality Imbalance Problems. In Eur. Conf. Comput. Vis.

Cited By

View all
  • (2025)Controllable instance synthesis with hierarchical regularization for semi-supervised pedestrian detectionNeurocomputing10.1016/j.neucom.2024.128831618(128831)Online publication date: Feb-2025
  • (2025)Efficient Multispectral Object Detection with attentive feature aggregation leveraging zero-shot implicit illumination guidanceInformation Fusion10.1016/j.inffus.2025.102939118(102939)Online publication date: Jun-2025
  • (2024)CF-deformable DETRProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/84(758-766)Online publication date: 3-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attentive alignment
  2. multi-modal fusion
  3. pedestrian detection

Qualifiers

  • Research-article

Funding Sources

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)202
  • Downloads (Last 6 weeks)19
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Controllable instance synthesis with hierarchical regularization for semi-supervised pedestrian detectionNeurocomputing10.1016/j.neucom.2024.128831618(128831)Online publication date: Feb-2025
  • (2025)Efficient Multispectral Object Detection with attentive feature aggregation leveraging zero-shot implicit illumination guidanceInformation Fusion10.1016/j.inffus.2025.102939118(102939)Online publication date: Jun-2025
  • (2024)CF-deformable DETRProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/84(758-766)Online publication date: 3-Aug-2024
  • (2024)Pseudo-Multispectral Pedestrian Detection with Deep Thermal Feature GuidanceGuidance, Navigation and Control10.1142/S273748072441004804:03Online publication date: 20-Jul-2024
  • (2024)DCMSTRD: End-to-end Dense Captioning via Multi-Scale Transformer DecodingIEEE Transactions on Multimedia10.1109/TMM.2024.336986326(7581-7593)Online publication date: 26-Feb-2024
  • (2024)MS-DETR: Multispectral Pedestrian Detection Transformer With Loosely Coupled Fusion and Modality-Balanced OptimizationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.345058425:12(20628-20642)Online publication date: Dec-2024
  • (2024)Revisiting misalignment in multispectral pedestrian detection: a language-driven approach for cross-modal alignment fusion2024 IEEE International Conference on Image Processing Challenges and Workshops (ICIPCW)10.1109/ICIPCW64161.2024.10769164(4217-4222)Online publication date: 27-Oct-2024
  • (2024)Cloud-Aware Fusion of Infrared and RGB Images for Aerial Target Detection2024 China Automation Congress (CAC)10.1109/CAC63892.2024.10865592(6942-6949)Online publication date: 1-Nov-2024
  • (2024)Dual enhanced semantic hashing for fast image retrievalMultimedia Tools and Applications10.1007/s11042-024-18275-z83:25(67083-67102)Online publication date: 22-Jan-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media