SA-FPN: An effective feature pyramid network for crowded human detection

Zhou, Xinxin; Zhang, Long

doi:10.1007/s10489-021-03121-8

SA-FPN: An effective feature pyramid network for crowded human detection

Published: 08 February 2022

Volume 52, pages 12556–12568, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1049 Accesses
80 Citations
1 Altmetric
Explore all metrics

Abstract

The crowded scenario not only contains instances at various scales but also introduces a variety of occlusion patterns ranging from non-occluded situations to heavily occluded cases, making the shapes of the instances different. All of those can result in human detectors being hard to apply to them. Feature pyramid networks (FPN), as an indispensable part of generic object detectors, can significantly boost detection performance involving objects at different scales. As a result, in this paper, we equip FPN with a multi-scale feature fusion technology and attention mechanisms to improve the performance of human detection in crowded scenarios. Firstly, we designed a feature pyramid structure with a refined hierarchical-split block, referred to as Scale-FPN, which can better handle the challenging problem of scale variation across object instances. Secondly, an attention-based lateral connection (ALC) module with spatial and channel attention mechanisms was proposed to replace the lateral connection in the FPN, which enhances the representational ability of feature maps through rich spatial and semantic information and lets detectors be capable of focusing on important features of occlusion patterns. Additionally, a bottom-up path augmentation (BPA) module was adopted to exploit the features of the Scale-FPN and ALC modules. To verify the effectiveness of the proposed method, we combined Scale-FPN, ALC and BPA, namely SA-FPN, and integrated it into the architecture of a crowded human detector. Experiments on the challenging CrowdHuman benchmark sufficiently validate the effectiveness of SA-FPN. Specifically, it improves the state-of-the-art result of CrowdDet from 41.4% to 39.9% \(MR^{-2}\), which indicates that the detector with SA-FPN brings in fewer false positives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Nguyen DT, Li W, Ogunbona PO (2016) Human detection from images and videos: A survey. Pattern Recognition 51:148–175
Article Google Scholar
Brunetti A, Buongiorno D, Trotta GF, Bevilacqua V (2018) Computer vision and deep learning techniques for pedestrian detection and tracking: A survey. Neurocomputing 300:17–33
Article Google Scholar
Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2016) How far are we from solving pedestrian detection? In: Proceedings of the iEEE conference on computer vision and pattern recognition, pp 1259–1267
Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking. 150401942
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision, Springer, pp 483–499
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 466–481
Xiao T, Li S, Wang B, Lin L, Wang X (2017) Joint detection and identification feature learning for person search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3415–3424
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. nature 521(7553):436–444
Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115(3):211–252
Article MathSciNet Google Scholar
Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: Detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7774–7783
Wu X, Sahoo D, Hoi SC (2020) Recent advances in deep learning for object detection. Neurocomputing 396:39–64
Article Google Scholar
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. International journal of computer vision 88(2):303–338
Article Google Scholar
Zhang K, Xiong F, Sun P, Hu L, Li B, Yu G (2019) Double anchor r-cnn for human detection in a crowd. 190909998
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. 150601497
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: A benchmark for detecting human in a crowd. 180500123
Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2019) Relational learning for joint head and human detection. 190910674
Mathias M, Benenson R, Timofte R, Van Gool L (2013) Handling occlusions with franken-classifiers. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1505–1512
Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE international conference on computer vision, pp 1904–1912
Zhou C, Yuan J (2016) Learning to integrate occlusion-specific detectors for heavily occluded pedestrian detection. In: Asian Conference on Computer Vision, Springer, pp 305–320
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms–improving object detection with one line of code. In: Proceedings of the IEEE international conference on computer vision, pp 5561–5569
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 637–653
Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 784–799
He Y, Zhu C, Wang J, Savvides M, Zhang X (2019) Bounding box regression with uncertainty for accurate object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 2888–2897
Liu S, Huang D, Wang Y (2019) Adaptive nms: Refining pedestrian detection in a crowd. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6459–6468
Rukhovich D, Sofiiuk K, Galeev D, Barinova O, Konushin A (2020) Iterdet: Iterative scheme for objectdetection in crowded environments. 200505708
Ge Z, Jie Z, Huang X, Xu R, Yoshie O (2020) Ps-rcnn: Detecting secondary human instances in a crowd via primary object suppression. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), IEEE, pp 1–6
Zhu J, Yuan Z, Zhang C, Chi W, Ling Y, et al (2020) Crowded human detection via an anchor-pair network. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 1391–1399
Yuan P, Lin S, Cui C, Du Y, Guo R, He D, Ding E, Han S (2020) Hs-resnet: Hierarchical-split block on convolutional neural network. 201007621
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Dollár P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: A benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp 304–311
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37(9):1904–1916
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. 190407850
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 840–849
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection. 191109516
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. 180402767
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2018) M2det: A single-shot object detector based on multi-level feature pyramid network. arXiv e-prints pp arXiv–1811
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR’06), IEEE, vol 3, pp 850–855
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Misra D, Nalamada T, Arasanipalai AU, Hou Q (2021) Rotate to attend: Convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 3139–3148
Pang Y, Xie J, Khan MH, Anwer RM, Khan FS, Shao L (2019) Mask-guided attention network for occluded pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4967–4975
Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6995–7003
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Robertson S (2008) A new interpretation of average precision. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp 689–690
Chu X, Zheng A, Zhang X, Sun J (2020) Detection in crowded scenes: One proposal, multiple predictions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12214–12223
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Huang X, Ge Z, Jie Z, Yoshie O (2020) Nms by representative region: Towards crowded pedestrian detection by proposal pairing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10750–10759
Zhou P, Zhou C, Peng P, Du J, Sun X, Guo X, Huang F (2020) Noh-nms: Improving pedestrian detection by nearby objects hallucination. In: Proceedings of the 28th ACM International Conference on Multimedia, pp 1967–1975
Xu Z, Li B, Yuan Y, Dang A (2020) Beta r-cnn: Looking into pedestrian detection from another perspective. Advances in Neural Information Processing Systems

Download references

Acknowledgements

This work was supported by “Thirteenth Five-Year Plan” Science and Technology Project of Jilin Provincial Education Department (No. JJKH20190710KJ) and Jilin Science and Technology Innovation Development Program Projects (No. 20190302202).

Author information

Authors and Affiliations

School of Computer Science, Northeast Electric Power University, 132012, Jilin, China
Xinxin Zhou & Long Zhang

Authors

Xinxin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Long Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, X., Zhang, L. SA-FPN: An effective feature pyramid network for crowded human detection. Appl Intell 52, 12556–12568 (2022). https://doi.org/10.1007/s10489-021-03121-8

Download citation

Accepted: 16 December 2021
Published: 08 February 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10489-021-03121-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SA-FPN: An effective feature pyramid network for crowded human detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SA-FPN: An effective feature pyramid network for crowded human detection

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

End-to-End Object Detection with Transformers

References

Acknowledgements

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation