Skip to main content
Log in

SA-FPN: An effective feature pyramid network for crowded human detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The crowded scenario not only contains instances at various scales but also introduces a variety of occlusion patterns ranging from non-occluded situations to heavily occluded cases, making the shapes of the instances different. All of those can result in human detectors being hard to apply to them. Feature pyramid networks (FPN), as an indispensable part of generic object detectors, can significantly boost detection performance involving objects at different scales. As a result, in this paper, we equip FPN with a multi-scale feature fusion technology and attention mechanisms to improve the performance of human detection in crowded scenarios. Firstly, we designed a feature pyramid structure with a refined hierarchical-split block, referred to as Scale-FPN, which can better handle the challenging problem of scale variation across object instances. Secondly, an attention-based lateral connection (ALC) module with spatial and channel attention mechanisms was proposed to replace the lateral connection in the FPN, which enhances the representational ability of feature maps through rich spatial and semantic information and lets detectors be capable of focusing on important features of occlusion patterns. Additionally, a bottom-up path augmentation (BPA) module was adopted to exploit the features of the Scale-FPN and ALC modules. To verify the effectiveness of the proposed method, we combined Scale-FPN, ALC and BPA, namely SA-FPN, and integrated it into the architecture of a crowded human detector. Experiments on the challenging CrowdHuman benchmark sufficiently validate the effectiveness of SA-FPN. Specifically, it improves the state-of-the-art result of CrowdDet from 41.4% to 39.9% \(MR^{-2}\), which indicates that the detector with SA-FPN brings in fewer false positives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Nguyen DT, Li W, Ogunbona PO (2016) Human detection from images and videos: A survey. Pattern Recognition 51:148–175

    Article  Google Scholar 

  2. Brunetti A, Buongiorno D, Trotta GF, Bevilacqua V (2018) Computer vision and deep learning techniques for pedestrian detection and tracking: A survey. Neurocomputing 300:17–33

    Article  Google Scholar 

  3. Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2016) How far are we from solving pedestrian detection? In: Proceedings of the iEEE conference on computer vision and pattern recognition, pp 1259–1267

  4. Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking. 150401942

  5. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, Springer, pp 483–499

  6. Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 466–481

  7. Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3415–3424

  8. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444

    Google Scholar 

  9. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  10. Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: Detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7774–7783

  11. Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64

    Article  Google Scholar 

  12. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. International journal of computer vision 88(2):303–338

    Article  Google Scholar 

  13. Zhang K, Xiong F, Sun P, Hu L, Li B, Yu G (2019) Double anchor r-cnn for human detection in a crowd. 190909998

  14. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. 150601497

  15. Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: A benchmark for detecting human in a crowd. 180500123

  16. Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2019) Relational learning for joint head and human detection. 190910674

  17. Mathias M, Benenson R, Timofte R, Van Gool L (2013) Handling occlusions with franken-classifiers. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1505–1512

  18. Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 1904–1912

  19. Zhou C, Yuan J (2016) Learning to integrate occlusion-specific detectors for heavily occluded pedestrian detection. In: Asian Conference on Computer Vision, Springer, pp 305–320

  20. Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569

  21. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 637–653

  22. Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 784–799

  23. He Y, Zhu C, Wang J, Savvides M, Zhang X (2019) Bounding box regression with uncertainty for accurate object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2888–2897

  24. Liu S, Huang D, Wang Y (2019) Adaptive nms: Refining pedestrian detection in a crowd. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6459–6468

  25. Rukhovich D, Sofiiuk K, Galeev D, Barinova O, Konushin A (2020) Iterdet: Iterative scheme for objectdetection in crowded environments. 200505708

  26. Ge Z, Jie Z, Huang X, Xu R, Yoshie O (2020) Ps-rcnn: Detecting secondary human instances in a crowd via primary object suppression. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1–6

  27. Zhu J, Yuan Z, Zhang C, Chi W, Ling Y, et al (2020) Crowded human detection via an anchor-pair network. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 1391–1399

  28. Yuan P, Lin S, Cui C, Du Y, Guo R, He D, Ding E, Han S (2020) Hs-resnet: Hierarchical-split block on convolutional neural network. 201007621

  29. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  30. Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  31. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768

  32. Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 304–311

  33. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  34. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  35. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37(9):1904–1916

    Article  Google Scholar 

  36. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  37. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37

  38. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  39. Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750

  40. Zhou X, Wang D, Krähenbühl P (2019) Objects as points. 190407850

  41. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 840–849

  42. Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790

  43. Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. 191109516

  44. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. 180402767

  45. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2018) M2det: A single-shot object detector based on multi-level feature pyramid network. arXiv e-prints pp arXiv–1811

  46. Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR’06), IEEE, vol 3, pp 850–855

  47. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  48. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  49. Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: Convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3139–3148

  50. Pang Y, Xie J, Khan MH, Anwer RM, Khan FS, Shao L (2019) Mask-guided attention network for occluded pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4967–4975

  51. Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6995–7003

  52. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755

  53. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  54. Robertson S (2008) A new interpretation of average precision. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp 689–690

  55. Chu X, Zheng A, Zhang X, Sun J (2020) Detection in crowded scenes: One proposal, multiple predictions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12214–12223

  56. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  57. Huang X, Ge Z, Jie Z, Yoshie O (2020) Nms by representative region: Towards crowded pedestrian detection by proposal pairing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10750–10759

  58. Zhou P, Zhou C, Peng P, Du J, Sun X, Guo X, Huang F (2020) Noh-nms: Improving pedestrian detection by nearby objects hallucination. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 1967–1975

  59. Xu Z, Li B, Yuan Y, Dang A (2020) Beta r-cnn: Looking into pedestrian detection from another perspective. Advances in Neural Information Processing Systems

Download references

Acknowledgements

This work was supported by “Thirteenth Five-Year Plan” Science and Technology Project of Jilin Provincial Education Department (No. JJKH20190710KJ) and Jilin Science and Technology Innovation Development Program Projects (No. 20190302202).

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, X., Zhang, L. SA-FPN: An effective feature pyramid network for crowded human detection. Appl Intell 52, 12556–12568 (2022). https://doi.org/10.1007/s10489-021-03121-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-03121-8

Keywords

Navigation