Skip to main content

Pedestrian Detection with a Directly-Cascaded Deconvolution-Convolution Structure

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing – PCM 2018 (PCM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

  • 3653 Accesses

Abstract

Driven by recent advances in deep learning, the accuracy of object detection has been tremendously improved. However, detecting small and blurred pedestrians still remains an open challenge. In this paper, we propose a novel neural network structure, which can be flexibly combined with powerful object detection systems for boosting pedestrian detection. The proposed structure contains two key modules: (i) a cascaded deconvolution-convolution (CDC) module to expand the resolution of feature maps, meanwhile, keep the crucial information in the feature maps; and (ii) a double-helix connection (DHC) module to effectively fuse shallow-level and deep-level features in the detection network. The CDC module enables the network to reuse features of the lower layers and learn richer features given low-resolution input. In addition, the DHC module incorporates the features learned in different layers in a novel and unified fashion. Extensive experiments on KITTI and Caltech Pedestrian datasets demonstrate that the proposed modules can be easily plugged into existing object detection networks (e.g., single-stage SSD and two-stage MSCNN) and consistently achieve better performance without bells and whistles.

This work is supported by NSFC(61471235), and Shanghai ‘The Belt and Road’ Young Scholar Exchange Grant(17510740100).

G. Liu—NSFC 61622305, NSFC 61502238, NSFJPC BK20141003

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)

    Google Scholar 

  2. Lin, T.-Y., et al. Focal loss for dense object detection. arXiv preprint arXiv:1708.02002 (2017)

  3. Shen, Z., Liu, Z., Li, J., Jiang, Y.-G., Chen, Y., Xue, X.: DSOD: learning deeply supervised object detectors from scratch. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

  4. Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 304–311. IEEE (2009)

    Google Scholar 

  5. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. IEEE (2012)

    Google Scholar 

  6. Wang, J., Yao, J., Zhang, Y., et al.: Collaborative learning for weakly supervised object detection. arXiv preprint arXiv:1802.03531 (2018)

  7. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A.: SSD: single shot multibox detector. In: ECCV (2016)

    Google Scholar 

  8. Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. In: CVPR (2015)

    Google Scholar 

  9. Lu, Z., et al.: Modeling the resource requirements of convolutional neural networks on mobile devices. In: Proceedings of the 2017 ACM on Multimedia Conference. ACM (2017)

    Google Scholar 

  10. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  11. Hosang, J., Omran, M., Benenson, R., Schiele, B.: Taking a deeper look at pedestrians. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4073–4082 (2015)

    Google Scholar 

  12. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  13. Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22

    Chapter  Google Scholar 

  14. Felzenszwalb, P.F.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  15. Girshick, R.B., Felzenszwalb, P.F., McAllester, D.: Discriminatively trained deformable part models, release 5 (2012)

    Google Scholar 

  16. Ren, J., et al.: Accurate single stage detector using recurrent rolling convolution. In: CVPR (2017)

    Google Scholar 

  17. Yang, F., Choi, W.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  18. Girshick, R.: Fast R-CNN. In: The IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  19. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, p. 2 (2014)

    Google Scholar 

  20. Lin, T.-Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  21. Kong, T., et al.: Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  23. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (2015)

    Google Scholar 

  24. Triggs, B., Dalal, N.: Histograms of oriented gradients for human detection. In: CVPR (2005)

    Google Scholar 

  25. Dollar, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: BMVC (2009)

    Google Scholar 

  26. Wang, X., Xiao, T., Jiang, Y., et al.: Repulsion loss: detecting pedestrians in a crowd. arXiv preprint arXiv:1711.07752 (2017)

  27. Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_28

    Chapter  Google Scholar 

  28. Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection & segmentation. arXiv preprint arXiv:1706.08564 (2017)

  29. Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1134–1142 (2015)

    Google Scholar 

  30. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  31. Dollar, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: BMVC (2010)

    Google Scholar 

  32. Benenson, R., Mathias, M., Timofte, R., Van Gool, L.: Pedestrian detection at 100 frames per second. In: CVPR (2012)

    Google Scholar 

  33. Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR (2013)

    Google Scholar 

  34. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. In: IJCV (2013)

    Google Scholar 

  35. Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiyao Lin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Z., Han, X., Lin, W., Cheng, MM., Liu, G., Xiong, H. (2018). Pedestrian Detection with a Directly-Cascaded Deconvolution-Convolution Structure. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00776-8_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00775-1

  • Online ISBN: 978-3-030-00776-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics