Skip to main content

Advertisement

Log in

FE-CSP: a fast and efficient pedestrian detector with center and scale prediction

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

There are still many pressing problems in pedestrian detection, such as difficulty in detection due to severe pedestrian occlusion, difficulty in detecting small objects and low detection speed. In this paper, we propose A Fast and Efficient Pedestrian Detector with Center and Scale Prediction (FE-CSP). We combine channel attention with spatial attention, replace the traditional convolution with deformable convolution, and embed the backbone network to propose CSANet (Channel and Spatial Attention Network), which efficiently extracts the semantic features of the object, and then propose a feature pyramid network to replace the traditional concatenation to perform multi-scale feature detection, which effectively improves the detection speed. By conducting experiments on CityPersons, our method achieves 10.1%, 13.7% and 47.4% \(MR^{-2}\) at a speed of 0.21 s/img on the reasonable setting, small setting and heavy setting, respectively. On Caltech, our method achieves 5.2% \(MR^{-2}\) at a speed of 0.06 s/img on the Reasonable setting, further demonstrating the superiority and generalization ability of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

All data generated or analysed during this study are included in this published article.

References

  1. Huang L, Zhao X, Huang K (2019) Bridging the gap between detection and tracking: A unified approach. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3999–4009

  2. Hattori H, Naresh Boddeti V, Kitani KM, Kanade T (2015) Learning scene-specific pedestrian detectors without real data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3819–3827

  3. Hbaieb A, Rezgui J, Chaari L (2019) Pedestrian detection for autonomous driving within cooperative communication system. In: 2019 IEEE Wireless Communications and Networking Conference (WCNC), pp. 1–6. IEEE

  4. Wei H, Zhang Q, Qian Y, Xu Z, Han J (2022) Mtsdet: multi-scale traffic sign detection with attention and path aggregation. Appl. Intell. 64:1–13

    Google Scholar 

  5. Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. Adv. Neural Informat. Process. Syst. 29:1–5

    Google Scholar 

  6. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969

  7. Huang R, Pedoeem J, Chen C (2018) Yolo-lite: a real-time object detection algorithm optimized for non-gpu computers. In: 2018 IEEE International Conference on Big Data (Big Data), pp. 2503–2510. IEEE

  8. Loey M, Manogaran G, Taha MHN, Khalifa NEM (2021) Fighting against covid-19: A novel deep learning model based on yolo-v2 with resnet-50 for medical face mask detection. Sustain. Cities Soc. 65:10260

    Article  Google Scholar 

  9. Heuer F, Mantowsky S, Bukhari S, Schneider G (2021) Multitask-centernet (mcn): Efficient and diverse multitask learning using an anchor free approach. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 997–1005

  10. Everingham M, Eslami S, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  11. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. European Conference on Computer Vision. Springer, London, pp 740–755

    Google Scholar 

  12. Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750

  13. Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9627–9636

  14. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578

  15. Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5187–5196

  16. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587

  17. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448

  18. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Informat Process syst 28:11–27

    Google Scholar 

  19. Zhang S, Benenson R, Schiele B (2017) Citypersons: A diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3213–3221

  20. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 637–653

  21. Wang X, Xiao T, Jiang, Y, Shao S, Sun J, Shen C (2018) Repulsion loss: Detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7774–7783

  22. Cai Z, Vasconcelos N (2019) Cascade r-cnn: high quality object detection and instance segmentation. IEEE Trans Patt Anal Mach Intell 43(5):1483–1498

    Article  Google Scholar 

  23. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. European Conference on Computer Vision. Springer, Berlin, pp 21–37

    Google Scholar 

  24. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788

  25. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988

  26. Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790

  27. Chen Q, Wang Y, Yang T, Zhang X, Cheng J, Sun J (2021) You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13039–13048

  28. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271

  29. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  30. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

  31. Song T, Sun L, Xie D, Sun H, Pu S (2018) Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 536–551

  32. Liu W, Liao S, Hu W, Liang X, Chen X (2018) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 618–634

  33. Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: A new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5187–5196

  34. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Patt Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  35. Rensink RA (2000) The dynamic representation of scenes. Visual Cognit 7(1–3):17–42

    Article  Google Scholar 

  36. Corbetta M, Shulman GL (2002) Control of goal-directed and stimulus-driven attention in the brain. Nature Rev Neurosci 3(3):201–215

    Article  Google Scholar 

  37. Zhu X, Cheng D, Zhang Z, Lin S, Dai J (2019) An empirical study of spatial attention mechanisms in deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6688–6697

  38. Zhao H, Zhang Y, Liu S, Shi J, Loy CC, Lin D, Jia J (2018) Psanet: Point-wise spatial attention network for scene parsing. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 267–283

  39. Jaderberg M, Simonyan K, Zisserman A et al (2015) Spatial transformer networks. Adv Neural Informat Process Syst 28:1–7

    Google Scholar 

  40. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803

  41. Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0

  42. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141

  43. Lu E, Hu X (2021) Image super-resolution via channel attention and spatial attention. Appl Intell 90:1–9

    Google Scholar 

  44. Lu Z, Xu B, Sun L, Zhan T, Tang S (2020) 3-d channel and spatial attention based multiscale spatial-spectral residual network for hyperspectral image classification. IEEE J Select Topics Appl Earth Observat Remote Sens 13:4311–4324

    Article  Google Scholar 

  45. Chen J, Chen Y, Li W, Ning G, Tong M, Hilton A (2021) Channel and spatial attention based deep object co-segmentation. Knowledge-Based Systems 211:106

    Article  Google Scholar 

  46. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19

  47. Zhu X, Cheng D, Zhang Z, Lin S, Dai J (2019) An empirical study of spatial attention mechanisms in deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6688–6697

  48. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125

  49. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: An evaluation of the state of the art. IEEE Trans Patt Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  50. Liu S, Huang D, Wang Y (2019) Adaptive nms: Refining pedestrian detection in a crowd. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6459–6468

  51. Zhang J, Lin L, Zhu J, Li Y, Chen Y-C, Hu Y, Hoi SC (2020) Attribute-aware pedestrian detection in a crowd. IEEE Trans Multimed 23:3085–3097

    Article  Google Scholar 

  52. Zhang, Y, He H, Li J, Li Y, See J, Lin W (2021) Variational pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11622–11631

  53. Tang Y, Li B, Liu M, Chen B, Wang Y, Ouyang W (2021) Autopedestrian: an automatic data augmentation and loss function search scheme for pedestrian detection. IEEE Trans Image Process 30:8483–8496

    Article  Google Scholar 

  54. Song X, Zhao K, Chu W-S, Zhang H, Guo J (2020) Progressive refinement network for occluded pedestrian detection. European conference on computer vision. Springer, Berlin, pp 32–48

    Google Scholar 

  55. Song X, Chen B, Li P, Wang B, Zhang H (2022) Prnet++: Learning towards generalized occluded pedestrian detection via progressive refinement network. Neurocomputing 482:98–115

    Article  Google Scholar 

  56. Dai J, Qi H, Xiong, Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773

  57. Zhang T, Cao Y, Zhang L, Li X (2022) Efficient feature fusion network based on center and scale prediction for pedestrian detection. Visu Comput 6:1–8

    Google Scholar 

Download references

Funding

This work is supported by the National Natural Science Foundation of China (Grant No. 61966035), the International Cooperation Project of the Science and Technology Department of the Autonomous Region (Grant No. 2020E01023), the Joint Foundation of the National Natural Science Foundation of China (Grant No. U1803261), the Autonomous Region Natural Science Foundation of China (Grant No. 2021D01C083) and Autonomous Region Science and Technology Program Youth Science Fund Project (Grant No. 2022D01C83).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yurong Qian.

Ethics declarations

Conflict of interest

The authors declared that they have no conflicts of interest to this work. We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, Y., Qian, Y., Wei, H. et al. FE-CSP: a fast and efficient pedestrian detector with center and scale prediction. J Supercomput 79, 4084–4104 (2023). https://doi.org/10.1007/s11227-022-04815-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04815-7

Keywords