skip to main content
10.1145/3638884.3638886acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccipConference Proceedingsconference-collections
research-article

Multi-scale Pedestrian Detection Based on Attention Mechanism and Feature Fusion

Published:23 April 2024Publication History

ABSTRACT

Scale variation of pedestrian targets is a major challenge in pedestrian detection, which leads to difficulties for pedestrian detection algorithms to accurately capture pedestrian targets at different scales. To address the above problems, this paper proposes a multi-scale pedestrian detection method based on attention mechanism and feature fusion. First, a new feature fusion module is constructed to improve the problem of insufficient semantic information of shallow features, so that the feature information of different scales can be fully fused to strengthen the detector's feature extraction ability for small-scale target pedestrians. Second, we introduce a spatial channel attention mechanism in the network to suppress irrelevant background information and enhance the extraction of key feature information of pedestrian targets. Finally, we optimize the original prior box parameters to generate more suitable prior boxes for detecting pedestrians to improve detection accuracy. Comparison experiment results on Caltech-USA and CityPersons pedestrian detection datasets show that our method achieves very competitive performance with the state-of-the-art methods.

References

  1. Dollar, Piotr, "Pedestrian detection: An evaluation of the state of the art." IEEE transactions on pattern analysis and machine intelligence 34.4 (2011): 743-761.Google ScholarGoogle Scholar
  2. Ren, Shaoqing, "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015).Google ScholarGoogle Scholar
  3. Viola, Paul, and Michael Jones. "Rapid object detection using a boosted cascade of simple features." Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001. Vol. 1. Ieee, 2001.Google ScholarGoogle Scholar
  4. Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Vol. 1. Ieee, 2005.Google ScholarGoogle Scholar
  5. Hsu, Wei-Yen, and Wen-Yen Lin. "Adaptive fusion of multi-scale YOLO for pedestrian detection." IEEE Access 9 (2021): 110063-110073.Google ScholarGoogle ScholarCross RefCross Ref
  6. Pang, Yanwei, "Mask-guided attention network for occluded pedestrian detection." Proceedings of the IEEE/CVF international conference on computer vision. 2019.Google ScholarGoogle Scholar
  7. Liu, Wei, "Ssd: Single shot multibox detector." Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016.Google ScholarGoogle Scholar
  8. Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.Google ScholarGoogle Scholar
  9. Woo, Sanghyun, "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.Google ScholarGoogle Scholar
  10. Zhang S, Benenson R, Schiele B.: Citypersons: A diverse dataset for pedestrian detection. In: the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3213-3221. (2017).Google ScholarGoogle ScholarCross RefCross Ref
  11. Dollar, Piotr, 2011. Pedestrian detection: An evaluation of the state of the art. In Proceedings of IEEE transactions on pattern analysis and machine intelligence. pp. 743-761. https://doi.org/10.1109/TPAMI.2011.155Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cai, Zhaowei, "A unified multi-scale deep convolutional neural network for fast object detection." Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14. Springer International Publishing, 2016.Google ScholarGoogle Scholar
  13. Wang S, Cheng J, Liu H, PCN: Part and context information for pedestrian detection with CNNs[J]. arXiv preprint arXiv:1804.04483, 2018.Google ScholarGoogle Scholar
  14. Du X, El-Khamy M, Lee J, Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection[C]//2017 IEEE winter conference on applications of computer vision (WACV). IEEE, 2017: 953-961.Google ScholarGoogle Scholar
  15. Lin C, Lu J, Wang G, Graininess-aware deep feature learning for pedestrian detection[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 732-747.Google ScholarGoogle Scholar
  16. Viola P, Jones M J. Robust real-time face detection[J]. International journal of computer vision, 2004, 57: 137-154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//2005 IEEE computer society conference on computer vision and pattern recognition (CVPR'05). Ieee, 2005, 1: 886-893.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zhang, S., Wen, L., Bian, X., Lei, Z., & Li, S. Z., “Occlusion-aware R-CNN: detecting pedestrians in a crowd,” In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 637-653, (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., & Shen, C., “Repulsion loss: Detecting pedestrians in a crowd,” In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7774-7783, (2018).Google ScholarGoogle ScholarCross RefCross Ref
  20. Song, T., Sun, L., Xie, D., Sun, H., & Pu, S., “Small-scale pedestrian detection based on topological line localization and temporal feature aggregation,” In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 536-551, (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Liu, W., Liao, S., Hu, W., Liang, X., & Chen, X., “Learning efficient single-stage pedestrian detectors by asymptotic localization fitting,” In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 618-634, (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICCIP '23: Proceedings of the 2023 9th International Conference on Communication and Information Processing
    December 2023
    648 pages
    ISBN:9798400708909
    DOI:10.1145/3638884

    Copyright © 2023 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 23 April 2024

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate61of301submissions,20%
  • Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)2

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format