Skip to main content

Weakly Aligned Multi-spectral Pedestrian Detection via Cross-Modality Differential Enhancement and Multi-scale Spatial Alignment

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Abstract

Multi-spectral pedestrian detection has attracted extensive attention in recent years. In particular, the combination of RGB and thermal infrared images allows the around-the-clock applications, even in the poor illumination conditions. Considering the fact that RGB and thermal infrared (RGB-T) image pairs are not well aligned, it leads to the inaccuracy of pedestrian detection. To this end, this paper proposes a Multi-scale Alignment and Differential Enhancement Network (MADENet) for multi-spectral pedestrian detection, consisting of Cross-Modality Differential Enhancement Module (CDEM) and Multi-scale Spatial Alignment Module (MSAM). CDEM module is embedded in the backbone to suppress the redundant features and extract complementary information between modalities, and MSAM module is designed to align the RGB-T features by the transformation of thermal features using features of RGB image as the reference. The proposed network is evaluated on the public KAIST dataset across different scenarios. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods. Miss rate using all test set can reach 8.01.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, Z., Huang, X.: Pedestrian detection for autonomous vehicle using Multispectral cameras. IEEE Trans. Intell. Veh. 4(2), 211–219 (2019)

    Article  Google Scholar 

  2. Selvi, C., Amudha, J.: Automatic video surveillance system for pedestrian crossing using digital image processing. Indian J. Sci. Technol. 12, 1–6 (2019)

    Article  Google Scholar 

  3. Buddharaju, P., Pavlidis, I.T., Tsiamyrtzis, P., Bazakos, M.: Physiology-based face recognition in the thermal infrared spectrum. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 613–626 (2007)

    Article  Google Scholar 

  4. Hwang, S., Park, J., Kim, N., Choi, Y., Kweon, I.S.: Multispectral pedestrian detection: benchmark dataset and baseline. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1037–1045 (2015)

    Google Scholar 

  5. Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., Teutsch, M.: Fully convolutional region proposal networks for multispectral person detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 243–250 (2017)

    Google Scholar 

  6. Ding, L., Wang, Y., Laganière, R., Huang, D., Luo, X., Zhang, H.: A robust and fast multispectral pedestrian detection deep network. Knowl.-Based Syst. 227, 106990 (2021)

    Article  Google Scholar 

  7. Cao, Y., Guan, D., Huang, W., Yang, J., Cao, Y., Qiao, Y.: Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. Inf. Fusion 46, 206–217 (2019)

    Article  Google Scholar 

  8. Zhang, L., Zhu, X., Chen, X., Yang, X., Lei, Z., Liu, Z.: Weakly aligned cross-modal learning for multispectral pedestrian detection. In: IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5126–5136 (2019)

    Google Scholar 

  9. Zhou, K., Chen, L., Cao, X.: Improving multispectral pedestrian detection by addressing modality imbalance problems. In: Computer Vision – ECCV 2020, pp. 787–803. Springer, Cham (2020)

    Google Scholar 

  10. Ren, J.S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)

    Article  Google Scholar 

  11. Liu, J., Zhang, S., Wang, S., Metaxas, D.: Multispectral deep neural networks for pedestrian detection. In: Richard, E.R.H., Wilson, C., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 73.1–73.13. BMVA Press (2016)

    Google Scholar 

  12. Li, C., Song, D., Tong, R., Tang, M.: Multispectral pedestrian detection via simultaneous detection and segmentation (2018)

    Google Scholar 

  13. Zhang, L., et al.: Cross-modality interactive attention network for multispectral pedestrian detection. Inf. Fusion 50, 20–29 (2019)

    Article  Google Scholar 

  14. Guan, D., Cao, Y., Yang, J., Cao, Y., Yang, M.Y.: Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection. Inf. Fusion 50, 148–157 (2019)

    Article  Google Scholar 

  15. Cheng, C., Wu, X.-J., Xu, T., Chen, G.: Unifusion: a lightweight unified image fusion network. IEEE Trans. Instrum. Meas. 70, 1–14 (2021)

    Google Scholar 

  16. Li, M., Tang, R.: Illumination-aware faster R-CNN for robust multispectral pedestrian detection. Pattern Recognit. J. Pattern Recognit. Soc. 85 (2019)

    Google Scholar 

  17. Hua, C., Sun, M., Zhu, Y., Jiang, Y., Yu, J., Chen, Y.: Pedestrian detection network with multi-modal cross-guided learning. Digit. Signal Process. 103370 (2022)

    Google Scholar 

  18. Liu, W., et al.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer, Cham (2016)

    Google Scholar 

  19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  20. Liu, W., Liao, S., Hu, W., Liang, X., Chen, X.: Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 618–634 (2018)

    Google Scholar 

  21. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 2999–3007 (2017)

    Google Scholar 

Download references

Acknowledgment

This work was sponsored by Beijing Nova Program (20230484409), National Natural Science Foundation of China (62272322, 62272323), applied basic research project of Liaoning province (2022JH2/101300279).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Guan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shao, Z., Chen, Y., Zou, Y., Zhang, J., Guan, Y. (2025). Weakly Aligned Multi-spectral Pedestrian Detection via Cross-Modality Differential Enhancement and Multi-scale Spatial Alignment. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15330. Springer, Cham. https://doi.org/10.1007/978-3-031-78113-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78113-1_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78112-4

  • Online ISBN: 978-3-031-78113-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics