Skip to main content
Log in

Multi-scale pedestrian detection using skip pooling and recurrent convolution

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Detecting pedestrians of different scales is essential for applications like autonomous driving. Recent research progress showed that combining multiple feature maps and contextual information helps detecting objects of different scales. In this paper, we propose a multi-scale pedestrian detector that combines skip pooling from multi-resolution feature maps and recurrent convolutional layers for extracting contextual information. To fully exploit the unique characteristics of the features at different levels for multi-scale pedestrian detection, the multi-scale features and the context features are fused at the fully connected layer. To gather spatial contextual information, we propose a modified recurrent convolutional layer that produces context feature maps with different resolutions. In addition, we construct a set of scale-dependent classification and bounding box regression subnetworks to further improve the performance of multi-scale pedestrian detection. Experiments on Caltech and KITTI pedestrian detection benchmark datasets show that the proposed method achieves the state-of-the-art performance with faster speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Bell S, Zitnick CL, Bala K, Girshick R (2015) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: arXiv:1512.04143

  2. Braun M, Rao Q, Wang Y, Flohr F (2016) Pose-rcnn: Joint object detection and pose estimation using 3d object proposals. In: IEEE 19th International Conference on Intelligent Transportation Systems, pp 1546–1551

  3. Byeon W, Breuel T, Raue F, Liwicki M (2015) Scene labeling with LSTM recurrent neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3547–3555

  4. Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: IEEE International Conference on Computer Vision, pp 3361–3369

  5. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision, pp 354–370

  6. Chen X, Kundu K, Zhu Y, Berneshawi A, Ma H (2015) 3d object proposals for accurate object class detection. In: Neural Information Processing Systems, pp 424–432

  7. Chen X, Kundu K, Zhang Z, Ma H, Fidler S, Urtasun R (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2147–2156

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Proc IEEE Conf Comput Vis Struct Recogn 1:886–893

    Google Scholar 

  9. DiCarlo JJ, Zoccolan D, Rust NC (2012) How does the brain solve visual object recognition? Neuron 29(3):415–434

    Article  Google Scholar 

  10. Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of british machine vision conference, pp 99.1–99.11

  11. Dollár P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: An evaluation of the state of the art. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 743–761

  12. Du X, El-Khamy M, Lee J, Davis L (2016) Fused dnn: A deep neural network fusion approach to fast and robust pedestrian detection. In: arXiv:1610.03466

  13. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3354– 3361

  14. Girshick R (2015) Fast r-cnn. In: IEEE International Conference on Computer Vision, pp 1440– 1448

  15. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587

  16. Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158

    Article  Google Scholar 

  17. Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 447–456

  18. He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: arXiv:1512.03385

  19. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37 (9):1904–1916

    Article  Google Scholar 

  20. Hu Q, Wang P, Shen C, Hengel A, Porikli F (2017) Pushing the limits of deep cnns for pedestrian detection. IEEE Trans Circ Syst Video Technol 89(99):1–1

    Google Scholar 

  21. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Derrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: arXiv:1408.5093

  22. Jung S, Hong K (2017) Deep network aided by guiding network for pedestrian detection. In: Pattern Recognition Letters, pp 43–49

  23. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 5244–5252

  24. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp 1106–1114

  25. Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. In: arXiv:1504.00941

  26. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324

  27. Li J, Liang X, Shen S, Xu T, Yan S (2015) Scale-aware fast r-cnn for pedestrian detection. IEEE Trans Multimed 10:1109

    Google Scholar 

  28. Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3367–3375

  29. Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 936–944

  30. Liu W, Rabinovich A, Berg AC (2015) ParseNet: Looking wider to see better. In: arXiv:1506.04579

  31. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg A (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision

  32. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440

  33. Pham C, Jeon J (2017) Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks. In: Signal Processing: Image Communication, pp 110–122

  34. Ren J, Chen X, Liu J, Sun W, Pang J, Yan Q, Tai Y, Xu L (2017) Accurate single stage detector using recurrent rolling convolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 752–760

  35. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  36. Sermanet P, Kavukcuoglu K, Chintala S, Lecun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of IEEE Conference on Computer Vision and pattern recognition, pp 3626–3633

  37. Shuai B, Zuo Z, Wang B, Wang G (2017) Scene segmentation with dag-recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 89(99):1–1

    Google Scholar 

  38. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv:1409.1556

  39. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9

  40. Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: IEEE International Conference on Computer Vision, pp 1904–1912

  41. Tripathi S, Lipton Z, Belongie S, Nguyen T (2016) Context matters: Refining object detection in video with recurrent neural networks. In: Proceedings of British Machine Vision Conference

  42. Uijlings JR, van de Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. In: International journal of Computer Vision, pp 154–171

  43. Xiang Y, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. In: IEEE winter Conference on Applications of Computer Vision

  44. Yang B, Yan J, Lei Z, Li S (2016) Craft objects from images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 6043–6051

  45. Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2129–2137

  46. Zagoruyko S, Lerer A, Lin T, Pinheiro PO, Gross S, Chintala S, Doll P (2016) A multipath network for object detection. In: Proceedings of British Machine Vision Conference

  47. Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision, pp 443–457

  48. Zhu Y, Wang J, Zhao C, Guo H, Lu H (2017) Scale-adaptive deconvolutional regression network for pedestrian detection. In: Asian Conference on Computer Vision, pp 416–430

Download references

Acknowledgements

This work is supported by the Industrial Core Technology Development Program of MOTIE/KEIT, KOREA.[#10083639, Development of Camera-based Real-time Artificial Intelligence System for Detecting Driving Environment and Recognizing Objects on Road Simultaneously]

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, C., Kim, J. Multi-scale pedestrian detection using skip pooling and recurrent convolution. Multimed Tools Appl 78, 1719–1736 (2019). https://doi.org/10.1007/s11042-018-6240-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6240-x

Keywords

Navigation