Abstract
Region-based Fully ConvNet (R-FCN) designed for general object detection is difficult to be directly applied for pedestrian detection, due to being with large human pose and scale changes, and even with partial occlusion in surveillance scenarios. This paper presents a rapid pedestrian detection method with partial occlusion handling, which builds on the framework of R-FCN. We introduce a deep Omega-shape feature learning and multi-paths detection to make our detector be robust to human pose and scale changes. A novel predicted boxes fusion strategy is proposed to reduce the number of false negatives caused by partial occlusion in crowded environment. Our end-to-end approach achieved 95.35% mAP on the Caltech dataset, 96.22% mAP on DukeMTMC dataset and 97.43% mAP on Bronze dataset at a test-time speed of approximate 86 ms per image.
Similar content being viewed by others
References
Dai J, Li, Y, He, K, Sun J (2016) R-FCN: object Detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Everingham M et al (2010) The PASCAL visual object classes (VOC) challenge. IJCV 88(2):303–338
Liu W et al (2016) SSD: single Shot multibox detector. In: ECCV, pp 21–37
Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. arXiv preprint arXiv:1612.08242
Piotr D et al (2012) Pedestrian detection: an evaluation of the state of the art. TPAMI 34(4):734–761
Li M et al (2009) Rapid and robust human detection and tracking based on omega-shape features. In: IEEE international conference on image processing, pp 2545–2548
Li M et al (2008) Estimating the number of people in crowded scenes by mid-based foreground segmentation and head-shoulder detection. In: International conference on pattern recognition, pp 1–4
Shen F et al (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. In: IEEE transactions on pattern analysis and machine intelligence
Shen F et al (2016) A fast optimization method for general binary code learning. IEEE Trans Image Process 25(12):5610–5621
Liliang Z, Liang L, Xiaodan L, Kaiming H (2016) Is faster R-CNN doing well for pedestrian detection?. In: ECCV, pp 443–457
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal network. In: NIPS, pp 91–99
Shen F, Tang Z, Jingsong X (2013) Locality constrained representation based classification with spatial pyramid patches. Neurocomputing 101:104–115
Huang J et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: CVPR
Dollar P et al (2009) Integral channel features. In: British machine vision conference
Tian Y, Luo P, Wang X, Tang X (2015) Pedestrian detection aided by deep learning semantic tasks. In: CVPR, pp 5079–5087
Piotr D et al (2014) Fast feature pyramids for object detection. PAMI 36(8):1532–1545
Zhang S, Benenson R, Schiele B (2015) Filtered channel features for pedestrian detection. In: CVPR
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. CVPR 1:886–893
Felzenszwalb PF et al (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Hosang J, Omran M, Benenson R, Schiele B (2015) Taking a deeper look at pedestrians. In: CVPR, pp 4073–4082
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: ICCV, pp 3361–3369
Wang X, Shrivastava A, Gupta A (2017) A-fast-RCNN: hard positive genneration via adversary for object detection. arXiv preprint arXiv:1704.03414
Goodfellow I et al (2014) Generative adversarial nets. In: NIPS, pp 2672–2680
Girshick R (2015) Fast R-CNN. arXiv preprint arXiv:1504.08083
Zhang S, Benenson R, Schiele B (2017) CityPersons: a diverse dataset for pedestrian detection. arXiv preprint arXiv:1702.05693
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR, pp 3431–3440
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778
Stewart R, Andriluka M, Ng AY (2016) End-to-end people detection in crowded scenes. In: CVPR, pp 2325–2333
Liu P, Zhou X, Cai S (2016) Omega-shape feature learning for robust human detection. In: CCPR, pp 290–303
Lenc K, Vedaldi A (2015) R-CNN minus R. arXiv preprint arXiv:1506.06981
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: CVPR, pp 761–769
Zitnick CL, Dollar P (2014) Edge boxes: Locating object proposals from edges? In: ECCV, pp 391–405
Bodla N et al (2017) Soft-NMS—improving object detection with one line of code. In: ICCV, pp 5561–5569
Ristani E et al (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: ECCV, pp 17–35
Jia Y et al (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM, pp 675–678
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR, pp 3354–3361
Lin T-Y et al (2017) Feature pyramid networks for object detection. In: CVPR
Funding
Funding was provided in part by the National Natural Science Foundation of China (Grant No. 61472063) and in part by the 2018 Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, Y., Zhou, X., Liu, P. et al. Rapid Pedestrian Detection Based on Deep Omega-Shape Features with Partial Occlusion Handing. Neural Process Lett 49, 923–937 (2019). https://doi.org/10.1007/s11063-018-9837-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-018-9837-1