Skip to main content
Log in

R-SSD: refined single shot multibox detector for pedestrian detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Pedestrian detection is a critical task in the field of computer vision, and it has made considerable progress with the help of Convnets. However, a persistent crucial problem is that small-scale pedestrians are notoriously difficult to detect because of the introduction of weak contrast and blurred boundaries in real-world scenarios. In this paper, we present a simple and compact detection method for detecting multi-scale pedestrians, which is especially suitable for detecting small-scale pedestrians that are not easily recognized in images or videos. We first interpret convolutional neural network (CNN) channel features, explore the detection performance of different feature fusion methods, and propose a novel two-level feature fusion strategy specially designed for small-scale pedestrians. Moreover, a sub-network named “prediction module” is injected into the framework to improve the general performance without any bells and whistles. In addition, we propose an adaptive loss that adds an adaptive adjustment coefficient to the Smooth L1 loss function to enhance its robustness to pedestrian detection tasks. Using these methods synthetically, we achieve state-of-the-art detection performance on the Caltech pedestrian dataset under three evaluation protocols; particularly, the performance of small-scale pedestrians under “Far” evaluation setting is improved (miss rate decreases from 70.97% to 60.09%). Further, the proposed method achieves a competitive speed-accuracy trade-off with 0.31 second per image of 1024×2048 pixels on the CityPersons dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

  2. Bi HB, Lu D, Zhu HH, Yang LN, Guan HP (2020) Sta-net: spatial-temporal attention network for video salient object detection. Appl Intell:1–10

  3. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934

  4. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, pp 354–370

  5. Chen C, Xiao H, Liu Y, Zhang M (2020) Dual-task integrated network for fast pedestrian detection in crowded scenes. IEICE Trans Inf Syst 103(6):1371–1379

    Article  Google Scholar 

  6. Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223

  7. Costea AD, Nedevschi S (2016) Semantic channels for fast pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2360–2368

  8. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. arXiv:1605.06409

  9. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE, pp 886–893

  10. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 248–255

  11. Dollár P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Article  Google Scholar 

  12. Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features

  13. Dollar P, Wojek C, Schiele B, Perona P (2011) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761

    Article  Google Scholar 

  14. Du X, El-Khamy M, Lee J, Davis L (2017) Fused dnn: a deep neural network fusion approach to fast and robust pedestrian detection. In: 2017 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 953–961

  15. Du X, El-Khamy M, Morariu VI, Lee J, Davis L (2018) Fused deep neural networks for efficient pedestrian detection. arXiv:1805.08688

  16. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  17. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1–8

  18. Felzenszwalb PF, Girshick R, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  19. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659

  20. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142

  21. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  22. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  24. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  25. Li Z, Zhou F (2017) Fssd: feature fusion single shot multibox detector. arXiv:1712.00960

  26. Lin C, Lu J, Wang G, Zhou J (2018) Graininess-aware deep feature learning for pedestrian detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 732–747

  27. Lin Z, Hua G, Davis LS (2009) Multiple instance ffeature for robust part-based object detection. In: 2009 IEEE Conference on computer vision and pattern recognition. IEEE, pp 405–412

  28. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  29. Liu W, Liao S, Hu W, Liang X, Chen X (2018) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 618–634

  30. Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5187–5196

  31. Mohan A, Papageorgiou C, Poggio T (2001) Example-based object detection in images by components. IEEE Trans Pattern Anal Mach Intell 23(4):349–361

    Article  Google Scholar 

  32. Nam W, Dollár P, Han JH (2014) Local decorrelation for improved pedestrian detection. Adv Neural Inf Process Syst 27:424–432

    Google Scholar 

  33. Ouyang W, Zhou H, Li H, Li Q, Yan J, Wang X (2017) Jointly learning deep features, deformable parts, occlusion and classification for pedestrian detection. IEEE Trans Pattern Anal Mach Intell 40 (8):1874–1887

    Article  Google Scholar 

  34. Pang Y, Xie J, Khan MH, Anwer RM, Khan FS, Shao L (2019) Mask-guided attention network for occluded pedestrian detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 4967–4975

  35. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  36. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  37. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv:1506.01497

  38. Ren X, Ramanan D (2013) Histograms of sparse codes for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3246–3253

  39. Saeidi M, Ahmadi A (2020) High-performance and deep pedestrian detection based on estimation of different parts. J Supercomput:1–36

  40. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  41. Song T, Sun L, Xie D, Sun H, Pu S (2018) Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 536–551

  42. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9

  43. Szegedy C, Reed S, Erhan D, Anguelov D, Ioffe S (2014) Scalable, high-quality object detection. arXiv:1412.1441

  44. Tesema FB, Wu H, Chen M, Lin J, Zhu W, Huang K (2020) Hybrid channel based pedestrian detection. Neurocomputing 389:1–8

    Article  Google Scholar 

  45. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  46. Viola P, Jones MJ, Snow D (2005) Detecting pedestrians using patterns of motion and appearance. Int J Comput Vis 63(2):153–161

    Article  Google Scholar 

  47. Wang K, Liu Y, Gou C, Wang FY (2015) A multi-view learning approach to foreground detection for traffic surveillance applications. IEEE Trans Veh Technol 65(6):4144–4158

    Article  Google Scholar 

  48. Wang S, Cheng J, Liu H, Tang M (2018) Pcn: Part and context information for pedestrian detection with cnns. arXiv:1804.04483

  49. Wang X, Wang M, Li W (2013) Scene-specific pedestrian detection for static video surveillance. IEEE Trans Pattern Anal Mach Intell 36(2):361–374

    Article  Google Scholar 

  50. Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: Detecting pedestrians in a crowd. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7774–7783

  51. Xiao J, Xie Y, Tillo T, Huang K, Wei Y, Feng J (2019) Ian: the individual aggregation network for person search. Pattern Recogn 87:332–340

    Article  Google Scholar 

  52. Xie H, Chen Y, Shin H (2019) Context-aware pedestrian detection especially for small-sized instances with deconvolution integrated faster rcnn (dif r-cnn). Appl Intell 49(3):1200–1211

    Article  Google Scholar 

  53. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1492–1500

  54. Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection?. In: European Conference on Computer Vision. Springer, pp 443–457

  55. Zhang S, Benenson R, Omran M, Hosang J, Schiele B (2016) How far are we from solving pedestrian detection?. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1259–1267

  56. Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3213–3221

  57. Zhang S, Benenson R, Schiele B, et al. (2015) Filtered channel features for pedestrian detection. In: CVPR, vol 1, p. 4

  58. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Occlusion-aware r-cnn: detecting pedestrians in a crowd. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 637– 653

  59. Zhang S, Yang J, Schiele B (2018) Occluded pedestrian detection through guided attention in cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6995–7003

  60. Zhang W, Wang K, Liu Y, Lu Y, Wang FY (2020) A parallel vision approach to scene-specific pedestrian detection. Neurocomputing 394:114–126

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61872019 and 61972015) and the high performance computing (HPC) resources at Beihang University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ding Yuan.

Ethics declarations

Competing interests

None.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, C., Zhang, H., Li, X. et al. R-SSD: refined single shot multibox detector for pedestrian detection. Appl Intell 52, 10430–10447 (2022). https://doi.org/10.1007/s10489-021-02798-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02798-1

Keywords

Navigation