Multi-scale pedestrian detection using skip pooling and recurrent convolution

Zhang, Chen; Kim, Joohee

doi:10.1007/s11042-018-6240-x

Multi-scale pedestrian detection using skip pooling and recurrent convolution

Published: 29 June 2018

Volume 78, pages 1719–1736, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

458 Accesses
6 Citations
Explore all metrics

Abstract

Detecting pedestrians of different scales is essential for applications like autonomous driving. Recent research progress showed that combining multiple feature maps and contextual information helps detecting objects of different scales. In this paper, we propose a multi-scale pedestrian detector that combines skip pooling from multi-resolution feature maps and recurrent convolutional layers for extracting contextual information. To fully exploit the unique characteristics of the features at different levels for multi-scale pedestrian detection, the multi-scale features and the context features are fused at the fully connected layer. To gather spatial contextual information, we propose a modified recurrent convolutional layer that produces context feature maps with different resolutions. In addition, we construct a set of scale-dependent classification and bounding box regression subnetworks to further improve the performance of multi-scale pedestrian detection. Experiments on Caltech and KITTI pedestrian detection benchmark datasets show that the proposed method achieves the state-of-the-art performance with faster speed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

Article Open access 23 March 2024

References

Bell S, Zitnick CL, Bala K, Girshick R (2015) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: arXiv:1512.04143
Braun M, Rao Q, Wang Y, Flohr F (2016) Pose-rcnn: Joint object detection and pose estimation using 3d object proposals. In: IEEE 19th International Conference on Intelligent Transportation Systems, pp 1546–1551
Byeon W, Breuel T, Raue F, Liwicki M (2015) Scene labeling with LSTM recurrent neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3547–3555
Cai Z, Saberian M, Vasconcelos N (2015) Learning complexity-aware cascades for deep pedestrian detection. In: IEEE International Conference on Computer Vision, pp 3361–3369
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision, pp 354–370
Chen X, Kundu K, Zhu Y, Berneshawi A, Ma H (2015) 3d object proposals for accurate object class detection. In: Neural Information Processing Systems, pp 424–432
Chen X, Kundu K, Zhang Z, Ma H, Fidler S, Urtasun R (2016) Monocular 3d object detection for autonomous driving. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2147–2156
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. Proc IEEE Conf Comput Vis Struct Recogn 1:886–893
Google Scholar
DiCarlo JJ, Zoccolan D, Rust NC (2012) How does the brain solve visual object recognition? Neuron 29(3):415–434
Article Google Scholar
Dollár P, Tu Z, Perona P, Belongie S (2009) Integral channel features. In: Proceedings of british machine vision conference, pp 99.1–99.11
Dollár P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: An evaluation of the state of the art. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 743–761
Du X, El-Khamy M, Lee J, Davis L (2016) Fused dnn: A deep neural network fusion approach to fast and robust pedestrian detection. In: arXiv:1610.03466
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3354– 3361
Girshick R (2015) Fast r-cnn. In: IEEE International Conference on Computer Vision, pp 1440– 1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Girshick R, Donahue J, Darrell T, Malik J (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158
Article Google Scholar
Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 447–456
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: arXiv:1512.03385
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37 (9):1904–1916
Article Google Scholar
Hu Q, Wang P, Shen C, Hengel A, Porikli F (2017) Pushing the limits of deep cnns for pedestrian detection. IEEE Trans Circ Syst Video Technol 89(99):1–1
Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Derrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: arXiv:1408.5093
Jung S, Hong K (2017) Deep network aided by guiding network for pedestrian detection. In: Pattern Recognition Letters, pp 43–49
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 5244–5252
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp 1106–1114
Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. In: arXiv:1504.00941
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp 2278–2324
Li J, Liang X, Shen S, Xu T, Yan S (2015) Scale-aware fast r-cnn for pedestrian detection. IEEE Trans Multimed 10:1109
Google Scholar
Liang M, Hu X (2015) Recurrent convolutional neural network for object recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3367–3375
Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 936–944
Liu W, Rabinovich A, Berg AC (2015) ParseNet: Looking wider to see better. In: arXiv:1506.04579
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg A (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 3431–3440
Pham C, Jeon J (2017) Robust object proposals re-ranking for object detection in autonomous driving using convolutional neural networks. In: Signal Processing: Image Communication, pp 110–122
Ren J, Chen X, Liu J, Sun W, Pang J, Yan Q, Tai Y, Xu L (2017) Accurate single stage detector using recurrent rolling convolution. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 752–760
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Sermanet P, Kavukcuoglu K, Chintala S, Lecun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of IEEE Conference on Computer Vision and pattern recognition, pp 3626–3633
Shuai B, Zuo Z, Wang B, Wang G (2017) Scene segmentation with dag-recurrent neural networks. IEEE Trans Pattern Anal Mach Intell 89(99):1–1
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In: arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 1–9
Tian Y, Luo P, Wang X, Tang X (2015) Deep learning strong parts for pedestrian detection. In: IEEE International Conference on Computer Vision, pp 1904–1912
Tripathi S, Lipton Z, Belongie S, Nguyen T (2016) Context matters: Refining object detection in video with recurrent neural networks. In: Proceedings of British Machine Vision Conference
Uijlings JR, van de Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. In: International journal of Computer Vision, pp 154–171
Xiang Y, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. In: IEEE winter Conference on Applications of Computer Vision
Yang B, Yan J, Lei Z, Li S (2016) Craft objects from images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 6043–6051
Yang F, Choi W, Lin Y (2016) Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 2129–2137
Zagoruyko S, Lerer A, Lin T, Pinheiro PO, Gross S, Chintala S, Doll P (2016) A multipath network for object detection. In: Proceedings of British Machine Vision Conference
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision, pp 443–457
Zhu Y, Wang J, Zhao C, Guo H, Lu H (2017) Scale-adaptive deconvolutional regression network for pedestrian detection. In: Asian Conference on Computer Vision, pp 416–430

Download references

Acknowledgements

This work is supported by the Industrial Core Technology Development Program of MOTIE/KEIT, KOREA.[#10083639, Development of Camera-based Real-time Artificial Intelligence System for Detecting Driving Environment and Recognizing Objects on Road Simultaneously]

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Illinois Institute of Technology, Chicago, IL, USA
Chen Zhang & Joohee Kim

Authors

Chen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Joohee Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, C., Kim, J. Multi-scale pedestrian detection using skip pooling and recurrent convolution. Multimed Tools Appl 78, 1719–1736 (2019). https://doi.org/10.1007/s11042-018-6240-x

Download citation

Received: 28 February 2018
Revised: 22 May 2018
Accepted: 04 June 2018
Published: 29 June 2018
Issue Date: January 2019
DOI: https://doi.org/10.1007/s11042-018-6240-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale pedestrian detection using skip pooling and recurrent convolution

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-scale pedestrian detection using skip pooling and recurrent convolution

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

A review of convolutional neural networks in computer vision

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation