Abstract
Pedestrian detection is a critical task in the field of computer vision, and it has made considerable progress with the help of Convnets. However, a persistent crucial problem is that small-scale pedestrians are notoriously difficult to detect because of the introduction of weak contrast and blurred boundaries in real-world scenarios. In this paper, we present a simple and compact detection method for detecting multi-scale pedestrians, which is especially suitable for detecting small-scale pedestrians that are not easily recognized in images or videos. We first interpret convolutional neural network (CNN) channel features, explore the detection performance of different feature fusion methods, and propose a novel two-level feature fusion strategy specially designed for small-scale pedestrians. Moreover, a sub-network named “prediction module” is injected into the framework to improve the general performance without any bells and whistles. In addition, we propose an adaptive loss that adds an adaptive adjustment coefficient to the Smooth L1 loss function to enhance its robustness to pedestrian detection tasks. Using these methods synthetically, we achieve state-of-the-art detection performance on the Caltech pedestrian dataset under three evaluation protocols; particularly, the performance of small-scale pedestrians under “Far” evaluation setting is improved (miss rate decreases from 70.97% to 60.09%). Further, the proposed method achieves a competitive speed-accuracy trade-off with 0.31 second per image of 1024×2048 pixels on the CityPersons dataset.
Similar content being viewed by others
References
Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Bi HB, Lu D, Zhu HH, Yang LN, Guan HP (2020) Sta-net: spatial-temporal attention network for video salient object detection. Appl Intell:1–10
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, pp 354–370
Chen C, Xiao H, Liu Y, Zhang M (2020) Dual-task integrated network for fast pedestrian detection in crowded scenes. IEICE Trans Inf Syst 103(6):1371–1379
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Costea AD, Nedevschi S (2016) Semantic channels for fast pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2360–2368
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. arXiv:1605.06409
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 886–893
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 248–255
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Du X, El-Khamy M, Lee J, Davis L (2017) Fused dnn: a deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 953–961
Du X, El-Khamy M, Morariu VI, Lee J, Davis L (2018) Fused deep neural networks for efficient pedestrian detection. arXiv:1805.08688
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1–8
Felzenszwalb PF, Girshick R, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. arXiv:1712.00960
Lin C, Lu J, Wang G, Zhou J (2018) Graininess-aware deep feature learning for pedestrian detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 732–747
Lin Z, Hua G, Davis LS (2009) Multiple instance ffeature for robust part-based object detection. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 405–412
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu W, Liao S, Hu W, Liang X, Chen X (2018) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 618–634
Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5187–5196
Mohan A, Papageorgiou C, Poggio T (2001) Example-based object detection in images by components. IEEE Trans Pattern Anal Mach Intell 23(4):349–361
Nam W, Dollár P, Han JH (2014) Local decorrelation for improved pedestrian detection. Adv Neural Inf Process Syst 27:424–432
Ouyang W, Zhou H, Li H, Li Q, Yan J, Wang X (2017) Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection. IEEE Trans Pattern Anal Mach Intell 40 (8):1874–1887
Pang Y, Xie J, Khan MH, Anwer RM, Khan FS, Shao L (2019) Mask-guided attention network for occluded pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4967–4975
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv:1506.01497
Ren X, Ramanan D (2013) Histograms of sparse codes for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3246–3253
Saeidi M, Ahmadi A (2020) High-performance and deep pedestrian detection based on estimation of different parts. J Supercomput:1–36
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Song T, Sun L, Xie D, Sun H, Pu S (2018) Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 536–551
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Szegedy C, Reed S, Erhan D, Anguelov D, Ioffe S (2014) Scalable, high-quality object detection. arXiv:1412.1441
Tesema FB, Wu H, Chen M, Lin J, Zhu W, Huang K (2020) Hybrid channel based pedestrian detection. Neurocomputing 389:1–8
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161
Wang K, Liu Y, Gou C, Wang FY (2015) A multi-view learning approach to foreground detection for traffic surveillance applications. IEEE Trans Veh Technol 65(6):4144–4158
Wang S, Cheng J, Liu H, Tang M (2018) Pcn: Part and context information for pedestrian detection with cnns. arXiv:1804.04483
Wang X, Wang M, Li W (2013) Scene-specific pedestrian detection for static video surveillance. IEEE Trans Pattern Anal Mach Intell 36(2):361–374
Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: Detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7774–7783
Xiao J, Xie Y, Tillo T, Huang K, Wei Y, Feng J (2019) Ian: the individual aggregation network for person search. Pattern Recogn 87:332–340
Xie H, Chen Y, Shin H (2019) Context-aware pedestrian detection especially for small-sized instances with deconvolution integrated faster rcnn (dif r-cnn). Appl Intell 49(3):1200–1211
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection?. In: European Conference on Computer Vision. Springer, pp 443–457
Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2016) How far are we from solving pedestrian detection?. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1259–1267
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3213–3221
Zhang S, Benenson R, Schiele B, et al. (2015) Filtered channel features for pedestrian detection. In: CVPR, vol 1, p. 4
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 637– 653
Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6995–7003
Zhang W, Wang K, Liu Y, Lu Y, Wang FY (2020) A parallel vision approach to scene-specific pedestrian detection. Neurocomputing 394:114–126
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 61872019 and 61972015) and the high performance computing (HPC) resources at Beihang University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
None.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, C., Zhang, H., Li, X. et al. R-SSD: refined single shot multibox detector for pedestrian detection. Appl Intell 52, 10430–10447 (2022). https://doi.org/10.1007/s10489-021-02798-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02798-1