R-SSD: refined single shot multibox detector for pedestrian detection

Yan, Chaoqi; Zhang, Hong; Li, Xuliang; Yuan, Ding

doi:10.1007/s10489-021-02798-1

R-SSD: refined single shot multibox detector for pedestrian detection

Published: 14 January 2022

Volume 52, pages 10430–10447, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chaoqi Yan ORCID: orcid.org/0000-0002-3065-6903¹,
Hong Zhang¹,
Xuliang Li¹ &
…
Ding Yuan¹

888 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Pedestrian detection is a critical task in the field of computer vision, and it has made considerable progress with the help of Convnets. However, a persistent crucial problem is that small-scale pedestrians are notoriously difficult to detect because of the introduction of weak contrast and blurred boundaries in real-world scenarios. In this paper, we present a simple and compact detection method for detecting multi-scale pedestrians, which is especially suitable for detecting small-scale pedestrians that are not easily recognized in images or videos. We first interpret convolutional neural network (CNN) channel features, explore the detection performance of different feature fusion methods, and propose a novel two-level feature fusion strategy specially designed for small-scale pedestrians. Moreover, a sub-network named “prediction module” is injected into the framework to improve the general performance without any bells and whistles. In addition, we propose an adaptive loss that adds an adaptive adjustment coefficient to the Smooth L1 loss function to enhance its robustness to pedestrian detection tasks. Using these methods synthetically, we achieve state-of-the-art detection performance on the Caltech pedestrian dataset under three evaluation protocols; particularly, the performance of small-scale pedestrians under “Far” evaluation setting is improved (miss rate decreases from 70.97% to 60.09%). Further, the proposed method achieves a competitive speed-accuracy trade-off with 0.31 second per image of 1024×2048 pixels on the CityPersons dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time pedestrian detection via hierarchical convolutional feature

Article 15 March 2018

CSSD: An End-to-End Deep Neural Network Approach to Pedestrian Detection

From macro to micro: rethinking multi-scale pedestrian detection

Article 01 March 2023

References

Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Bi HB, Lu D, Zhu HH, Yang LN, Guan HP (2020) Sta-net: spatial-temporal attention network for video salient object detection. Appl Intell:1–10
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, pp 354–370
Chen C, Xiao H, Liu Y, Zhang M (2020) Dual-task integrated network for fast pedestrian detection in crowded scenes. IEICE Trans Inf Syst 103(6):1371–1379
Article Google Scholar
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Costea AD, Nedevschi S (2016) Semantic channels for fast pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2360–2368
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. arXiv:1605.06409
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 886–893
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 248–255
Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Article Google Scholar
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features
Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761
Article Google Scholar
Du X, El-Khamy M, Lee J, Davis L (2017) Fused dnn: a deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 953–961
Du X, El-Khamy M, Morariu VI, Lee J, Davis L (2018) Fused deep neural networks for efficient pedestrian detection. arXiv:1805.08688
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1–8
Felzenszwalb PF, Girshick R, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. arXiv:1712.00960
Lin C, Lu J, Wang G, Zhou J (2018) Graininess-aware deep feature learning for pedestrian detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 732–747
Lin Z, Hua G, Davis LS (2009) Multiple instance ffeature for robust part-based object detection. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 405–412
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu W, Liao S, Hu W, Liang X, Chen X (2018) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 618–634
Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5187–5196
Mohan A, Papageorgiou C, Poggio T (2001) Example-based object detection in images by components. IEEE Trans Pattern Anal Mach Intell 23(4):349–361
Article Google Scholar
Nam W, Dollár P, Han JH (2014) Local decorrelation for improved pedestrian detection. Adv Neural Inf Process Syst 27:424–432
Google Scholar
Ouyang W, Zhou H, Li H, Li Q, Yan J, Wang X (2017) Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection. IEEE Trans Pattern Anal Mach Intell 40 (8):1874–1887
Article Google Scholar
Pang Y, Xie J, Khan MH, Anwer RM, Khan FS, Shao L (2019) Mask-guided attention network for occluded pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4967–4975
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv:1506.01497
Ren X, Ramanan D (2013) Histograms of sparse codes for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3246–3253
Saeidi M, Ahmadi A (2020) High-performance and deep pedestrian detection based on estimation of different parts. J Supercomput:1–36
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Song T, Sun L, Xie D, Sun H, Pu S (2018) Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 536–551
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Szegedy C, Reed S, Erhan D, Anguelov D, Ioffe S (2014) Scalable, high-quality object detection. arXiv:1412.1441
Tesema FB, Wu H, Chen M, Lin J, Zhu W, Huang K (2020) Hybrid channel based pedestrian detection. Neurocomputing 389:1–8
Article Google Scholar
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161
Article Google Scholar
Wang K, Liu Y, Gou C, Wang FY (2015) A multi-view learning approach to foreground detection for traffic surveillance applications. IEEE Trans Veh Technol 65(6):4144–4158
Article Google Scholar
Wang S, Cheng J, Liu H, Tang M (2018) Pcn: Part and context information for pedestrian detection with cnns. arXiv:1804.04483
Wang X, Wang M, Li W (2013) Scene-specific pedestrian detection for static video surveillance. IEEE Trans Pattern Anal Mach Intell 36(2):361–374
Article Google Scholar
Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: Detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7774–7783
Xiao J, Xie Y, Tillo T, Huang K, Wei Y, Feng J (2019) Ian: the individual aggregation network for person search. Pattern Recogn 87:332–340
Article Google Scholar
Xie H, Chen Y, Shin H (2019) Context-aware pedestrian detection especially for small-sized instances with deconvolution integrated faster rcnn (dif r-cnn). Appl Intell 49(3):1200–1211
Article Google Scholar
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection?. In: European Conference on Computer Vision. Springer, pp 443–457
Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2016) How far are we from solving pedestrian detection?. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1259–1267
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3213–3221
Zhang S, Benenson R, Schiele B, et al. (2015) Filtered channel features for pedestrian detection. In: CVPR, vol 1, p. 4
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 637– 653
Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6995–7003
Zhang W, Wang K, Liu Y, Lu Y, Wang FY (2020) A parallel vision approach to scene-specific pedestrian detection. Neurocomputing 394:114–126
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61872019 and 61972015) and the high performance computing (HPC) resources at Beihang University.

Author information

Authors and Affiliations

Image Processing Center, Beihang University, Beijing, 102206, People’s Republic of China
Chaoqi Yan, Hong Zhang, Xuliang Li & Ding Yuan

Authors

Chaoqi Yan
View author publications
You can also search for this author in PubMed Google Scholar
Hong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Ding Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ding Yuan.

Ethics declarations

Competing interests

None.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, C., Zhang, H., Li, X. et al. R-SSD: refined single shot multibox detector for pedestrian detection. Appl Intell 52, 10430–10447 (2022). https://doi.org/10.1007/s10489-021-02798-1

Download citation

Accepted: 25 August 2021
Published: 14 January 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s10489-021-02798-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

R-SSD: refined single shot multibox detector for pedestrian detection

Abstract

Access this article

Similar content being viewed by others

Real-time pedestrian detection via hierarchical convolutional feature

CSSD: An End-to-End Deep Neural Network Approach to Pedestrian Detection

From macro to micro: rethinking multi-scale pedestrian detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

R-SSD: refined single shot multibox detector for pedestrian detection

Abstract

Access this article

Similar content being viewed by others

Real-time pedestrian detection via hierarchical convolutional feature

CSSD: An End-to-End Deep Neural Network Approach to Pedestrian Detection

From macro to micro: rethinking multi-scale pedestrian detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation