Skip to main content

Convergence and Divergence: A New Paradigm for Pedestrian Detection

  • Conference paper
  • First Online:
Advanced Intelligent Computing Technology and Applications (ICIC 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14868))

Included in the following conference series:

  • 427 Accesses

Abstract

Complex backgrounds, scale and occlusion variance have long limited the accuracy of pedestrian detection. In this paper, we propose a pedestrian detector named Convergence and Divergence (CADNet). In “Convergence” network, we propose a cross-scale semantic alignment block (CSAB). CSAB effectively mitigates the background interference and resolves scale variance through multi-scale global contexts aggregation, without extensive computational overhead. In “Divergence” network, we propose a receptive field differentiation block (RFDB) to tackle the challenges of scale and occlusion variance. RFDB generates discriminative features with varying receptive fields, effectively capturing pedestrians across different scales and occlusion conditions. Due to the effectiveness of the proposed components, CADNet achieves an excellent performance of 8.47% and 2.16% MR−2 on a Reasonable subset of CityPersons and Caltech, respectively. Extensive experiments demonstrate the robustness and efficiency of CADNet, ensuring its superior performance in various scenarios.

Y. Zhu, H. Huang and S. Yue—Contribute equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, C.F.R., Fan, Q., Panda, R.: Crossvit: cross-attention multi-scale vision transformer for image classification. In: ICCV, pp. 347–356 (2021)

    Google Scholar 

  2. Chen, W., et al.: Beyond appearance: a semantic controllable self-supervised learning frame work for human-centric visual tasks. In: CVPR, pp. 15050–15061 (2023)

    Google Scholar 

  3. Dollár, P., Wojek, C., Schiele, B., et al.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)

    Article  Google Scholar 

  4. Hsu, W.Y., Chen, P.C.: Pedestrian detection using stationary wavelet dilated residual super-resolution. IEEE Trans. Instrum. Meas. 71, 1–11 (2022)

    Google Scholar 

  5. Huang, Z., et al.: Ccnet: criss-cross attention for semantic segmentation. In: ICCV, pp. 603–612 (2019)

    Google Scholar 

  6. Jiang, H., Liao, S., Li, J., et al.: Urban scene based semantical modulation for pedestrian detection. Neurocomputing 474, 1–12 (2022)

    Article  Google Scholar 

  7. Li, C., Zhou, A., Yao, A.: Omni-dimensional dynamic convolution. In: ICLR (2022)

    Google Scholar 

  8. Li, J., et al.: Box guided convolution for pedestrian detection. In: ACM MM, pp. 1615–1624 (2020)

    Google Scholar 

  9. Li, Q., Su, Y., Gao, Y., et al.: Oaf-net: an occlusion-aware anchor-free network for pedestrian detection in a crowd. IEEE Trans. Intell. Transp. Syst. 23(11), 21291–21300 (2022)

    Article  Google Scholar 

  10. Li, X., et al.: Selective kernel networks. In: CVPR, pp. 510–519 (2019)

    Google Scholar 

  11. Liu, M., et al.: Vlpd: context-aware pedestrian detection via vision-language semantic self-supervision. In: CVPR, pp. 6662–6671 (2023)

    Google Scholar 

  12. Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 404–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_24

    Chapter  Google Scholar 

  13. Liu, W., Liao, S., Weidong, H., Liang, X., Chen, X.: Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIV, pp. 643–659. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_38

    Chapter  Google Scholar 

  14. Liu, W., et al.: High-level semantic feature detection: a new perspective for pedestrian detection. In: CVPR, pp. 5182–5191 (2019)

    Google Scholar 

  15. Mei, Y., et al.: Image super-resolution with cross-scale non-local attention and exhaustive self-exemplars mining. In: CVPR, pp. 5689–5698 (2020)

    Google Scholar 

  16. Song, T., Sun, L., Xie, D., Sun, H., Shiliang, P.: Small-Scale Pedestrian Detection Based on Topological Line Localization and Temporal Feature Aggregation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, pp. 554–569. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_33

    Chapter  Google Scholar 

  17. Tan, Y., et al.: Prf-ped: multi-scale pedestrian detector with prior-based receptive field. In: ICPR, pp. 6059–6064 (2020)

    Google Scholar 

  18. Wang, J., Sun, K., Cheng, T., et al.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3349–3364 (2021)

    Article  Google Scholar 

  19. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: CVPR, pp. 7794–7803 (2018)

    Google Scholar 

  20. Yuan, J., Panagiotis, B., Stathaki, T.: Effectiveness of vision transformer for fast and accurate single-stage pedestrian detection. In: NIPS. (2022)

    Google Scholar 

  21. Zhang, J., Lin, L., Zhu, J., et al.: Attribute-aware pedestrian detection in a crowd. IEEE Trans. Multimedia 23, 3085–3097 (2021)

    Article  Google Scholar 

  22. Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: CVPR, pp. 4457–4465 (2017)

    Google Scholar 

  23. Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part III, pp. 657–674. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_39

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by the National Key R&D Program of China under Grant 2022YFF0904300.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, Y., Huang, H., Yue, S., Zhang, S., Chen, A. (2024). Convergence and Divergence: A New Paradigm for Pedestrian Detection. In: Huang, DS., Zhang, C., Zhang, Q. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14868. Springer, Singapore. https://doi.org/10.1007/978-981-97-5600-1_36

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5600-1_36

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5599-8

  • Online ISBN: 978-981-97-5600-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics