Abstract
Detecting pedestrians of different scales is essential for applications like autonomous driving. Recent research progress showed that combining multiple feature maps and contextual information helps detecting objects of different scales. In this paper, we propose a multi-scale pedestrian detector that combines skip pooling from multi-resolution feature maps and recurrent convolutional layers for extracting contextual information. To fully exploit the unique characteristics of the features at different levels for multi-scale pedestrian detection, the multi-scale features and the context features are fused at the fully connected layer. To gather spatial contextual information, we propose a modified recurrent convolutional layer that produces context feature maps with different resolutions. In addition, we construct a set of scale-dependent classification and bounding box regression subnetworks to further improve the performance of multi-scale pedestrian detection. Experiments on Caltech and KITTI pedestrian detection benchmark datasets show that the proposed method achieves the state-of-the-art performance with faster speed.
Similar content being viewed by others
References
Bell S, Zitnick CL, Bala K, Girshick R (2015) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: arXiv:1512.04143
Braun M, Rao Q, Wang Y, Flohr F (2016) Pose-rcnn: Joint object detection and pose estimation using 3d object proposals. In: IEEE 19th International Conference on Intelligent Transportation Systems, pp 1546–1551
Byeon W, Breuel T, Raue F, Liwicki M (2015) Scene labeling with LSTM recurrent neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3547–3555
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: IEEE International Conference on Computer Vision, pp 3361–3369
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision, pp 354–370
Chen X, Kundu K, Zhu Y, Berneshawi A, Ma H (2015) 3d object proposals for accurate object class detection. In: Neural Information Processing Systems, pp 424–432
Chen X, Kundu K, Zhang Z, Ma H, Fidler S, Urtasun R (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2147–2156
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Proc IEEE Conf Comput Vis Struct Recogn 1:886–893
DiCarlo JJ, Zoccolan D, Rust NC (2012) How does the brain solve visual object recognition? Neuron 29(3):415–434
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of british machine vision conference, pp 99.1–99.11
Dollár P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: An evaluation of the state of the art. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 743–761
Du X, El-Khamy M, Lee J, Davis L (2016) Fused dnn: A deep neural network fusion approach to fast and robust pedestrian detection. In: arXiv:1610.03466
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3354– 3361
Girshick R (2015) Fast r-cnn. In: IEEE International Conference on Computer Vision, pp 1440– 1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158
Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 447–456
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: arXiv:1512.03385
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37 (9):1904–1916
Hu Q, Wang P, Shen C, Hengel A, Porikli F (2017) Pushing the limits of deep cnns for pedestrian detection. IEEE Trans Circ Syst Video Technol 89(99):1–1
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Derrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: arXiv:1408.5093
Jung S, Hong K (2017) Deep network aided by guiding network for pedestrian detection. In: Pattern Recognition Letters, pp 43–49
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 5244–5252
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp 1106–1114
Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. In: arXiv:1504.00941
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324
Li J, Liang X, Shen S, Xu T, Yan S (2015) Scale-aware fast r-cnn for pedestrian detection. IEEE Trans Multimed 10:1109
Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3367–3375
Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 936–944
Liu W, Rabinovich A, Berg AC (2015) ParseNet: Looking wider to see better. In: arXiv:1506.04579
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg A (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440
Pham C, Jeon J (2017) Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks. In: Signal Processing: Image Communication, pp 110–122
Ren J, Chen X, Liu J, Sun W, Pang J, Yan Q, Tai Y, Xu L (2017) Accurate single stage detector using recurrent rolling convolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 752–760
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Sermanet P, Kavukcuoglu K, Chintala S, Lecun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of IEEE Conference on Computer Vision and pattern recognition, pp 3626–3633
Shuai B, Zuo Z, Wang B, Wang G (2017) Scene segmentation with dag-recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 89(99):1–1
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: IEEE International Conference on Computer Vision, pp 1904–1912
Tripathi S, Lipton Z, Belongie S, Nguyen T (2016) Context matters: Refining object detection in video with recurrent neural networks. In: Proceedings of British Machine Vision Conference
Uijlings JR, van de Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. In: International journal of Computer Vision, pp 154–171
Xiang Y, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. In: IEEE winter Conference on Applications of Computer Vision
Yang B, Yan J, Lei Z, Li S (2016) Craft objects from images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 6043–6051
Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2129–2137
Zagoruyko S, Lerer A, Lin T, Pinheiro PO, Gross S, Chintala S, Doll P (2016) A multipath network for object detection. In: Proceedings of British Machine Vision Conference
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision, pp 443–457
Zhu Y, Wang J, Zhao C, Guo H, Lu H (2017) Scale-adaptive deconvolutional regression network for pedestrian detection. In: Asian Conference on Computer Vision, pp 416–430
Acknowledgements
This work is supported by the Industrial Core Technology Development Program of MOTIE/KEIT, KOREA.[#10083639, Development of Camera-based Real-time Artificial Intelligence System for Detecting Driving Environment and Recognizing Objects on Road Simultaneously]
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, C., Kim, J. Multi-scale pedestrian detection using skip pooling and recurrent convolution. Multimed Tools Appl 78, 1719–1736 (2019). https://doi.org/10.1007/s11042-018-6240-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6240-x