Progressive Refinement Network for Occluded Pedestrian Detection

Song, Xiaolin; Zhao, Kaili; Chu, Wen-Sheng; Zhang, Honggang; Guo, Jun

doi:10.1007/978-3-030-58592-1_3

Xiaolin Song¹²,
Kaili Zhao¹²,
Wen-Sheng Chu¹³,
Honggang Zhang¹² &
…
Jun Guo¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12368))

Included in the following conference series:

European Conference on Computer Vision

3662 Accesses
34 Citations

Abstract

We present Progressive Refinement Network (PRNet), a novel single-stage detector that tackles occluded pedestrian detection. Motivated by human’s progressive process on annotating occluded pedestrians, PRNet achieves sequential refinement by three phases: Finding high-confident anchors of visible parts, calibrating such anchors to a full-body template derived from occlusion statistics, and then adjusting the calibrated anchors to final full-body regions. Unlike conventional methods that exploit predefined anchors, the confidence-aware calibration offers adaptive anchor initialization for detection with occlusions, and helps reduce the gap between visible-part and full-body detection. In addition, we introduce an occlusion loss to up-weigh hard examples, and a Receptive Field Backfeed (RFB) module to diversify receptive fields in early layers that commonly fire only on visible parts or small-size full-body regions. Experiments were performed within and across CityPersons, ETH, and Caltech datasets. Results show that PRNet can match the speed of existing single-stage detectors, consistently outperforms alternatives in terms of overall miss rate, and offers significantly better cross-dataset generalization. Code is available (https://github.com/sxlpris).

X. Song and K. Zhao—These authors contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://bitbucket.org/shanshanzhang/citypersons/.

References

Brazil, G., Liu, X.: Pedestrian detection with autoregressive network phases. In: CVPR (2019)
Google Scholar
Brazil, G., Yin, X., Liu, X.: Illuminating pedestrians via simultaneous detection & segmentation. In: ICCV (2017)
Google Scholar
Chi, C., Zhang, S., Xing, J., Lei, Z., Li, S.Z., Zou, X.: Selective refinement network for high performance face detection. In: AAAI (2019)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. TPAMI 34 (2012)
Google Scholar
Duan, G., Ai, H., Lao, S.: A structural filter approach to human detection. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 238–251. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3_18
Chapter Google Scholar
Enzweiler, M., Eigenstetter, A., Schiele, B., Gavrila, D.M.: Multi-cue pedestrian classification with partial occlusion handling. In: CVPR (2010)
Google Scholar
Ess, A., Leibe, B., Van Gool, L.: Depth and appearance for mobile scene analysis. In: ICCV (2007)
Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. 32 (2013)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR (2012)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Li, J., Liang, X., Shen, S., Xu, T., Feng, J., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. TMM 20 (2018)
Google Scholar
Lin, C., Lu, J., Wang, G., Zhou, J.: Graininess-aware deep feature learning for pedestrian detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 745–761. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_45
Chapter Google Scholar
Lin, T., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: CVPR (2017)
Google Scholar
Liu, S., Huang, D., Wang, Y.: Adaptive NMS: refining pedestrian detection in a crowd. In: CVPR (2019)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, W., Liao, S., Hu, W., Liang, X., Chen, X.: Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 643–659. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_38
Chapter Google Scholar
Liu, W., Liao, S., Ren, W., Hu, W., Yu, Y.: High-level semantic feature detection: a new perspective for pedestrian detection. In: CVPR (2019)
Google Scholar
Mathias, M., Benenson, R., Timofte, R., Gool, L.V.: Handling occlusions with Franken-classifiers. In: ICCV (2013)
Google Scholar
Nascimento, J.C., Marques, J.S.: Performance evaluation of object detection algorithms for video surveillance. TMM 8 (2006)
Google Scholar
Noh, J., Lee, S., Kim, B., Kim, G.: Improving occlusion and hard negative handling for single-stage pedestrian detectors. In: CVPR (2018)
Google Scholar
Ouyang, W., Wang, X.: A discriminative deep model for pedestrian detection with occlusion handling. In: CVPR (2012)
Google Scholar
Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: ICCV (2013)
Google Scholar
Ouyang, W., Zeng, X., Wang, X.: Modeling mutual visibility relationship in pedestrian detection. In: CVPR (2013)
Google Scholar
Ouyang, W., Wang, X.: Single-pedestrian detection aided by multi-pedestrian detection. In: CVPR (2013)
Google Scholar
Pang, Y., Xie, J., Khan, M.H., Anwer, R.M., Khan, F.S., Shao, L.: Mask-guided attention network for occluded pedestrian detection. In: ICCV (2019)
Google Scholar
Pepikj, B., Stark, M., Gehler, P., Schiele, B.: Occlusion patterns for object class detection. In: CVPR (2013)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Shet, V.D., Neumann, J., Ramesh, V., Davis, L.S.: Bilattice-based logical reasoning for human detection. In: CVPR (2007)
Google Scholar
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv:1312.6034 (2013)
Song, T., Sun, L., Xie, D., Sun, H., Pu, S.: Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 554–569. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_33
Chapter Google Scholar
Tang, S., Andriluka, M., Schiele, B.: Detection and tracking of occluded people. IJCV 110 (2014)
Google Scholar
Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: ICCV (2015)
Google Scholar
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: CVPR (2018)
Google Scholar
Wu, B., Nevatia, R.: Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors. In: ICCV (2005)
Google Scholar
Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_28
Chapter Google Scholar
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: How far are we from solving pedestrian detection? In: CVPR (2016)
Google Scholar
Zhang, S., Benenson, R., Schiele, B.: Citypersons: a diverse dataset for pedestrian detection. In: CVPR (2017)
Google Scholar
Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNs. In: CVPR (2018)
Google Scholar
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 657–674. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_39
Chapter Google Scholar
Zhou, C., Yang, M., Yuan, J.: Discriminative feature transformation for occluded pedestrian detection. In: ICCV (2019)
Google Scholar
Zhou, C., Yuan, J.: Multi-label learning of part detectors for heavily occluded pedestrian detection. In: ICCV (2017)
Google Scholar
Zhou, C., Yuan, J.: Bi-box regression for pedestrian detection and occlusion estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 138–154. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_9
Chapter Google Scholar

Download references

Acknowledgement

Research reported in this paper was supported in part by the Natural Science Foundation of China under grant 61701032 & 62076036 to KZ, and Beijing Municiple Science and Technology Commission project under Grant No.Z181100001918005 to XS and HZ. We thank Jayakorn Vongkulbhisal for helpful comments.

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Xiaolin Song, Kaili Zhao, Honggang Zhang & Jun Guo
Google, Mountain View, CA, USA
Wen-Sheng Chu

Authors

Xiaolin Song
View author publications
You can also search for this author in PubMed Google Scholar
Kaili Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Sheng Chu
View author publications
You can also search for this author in PubMed Google Scholar
Honggang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kaili Zhao .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 14786 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, X., Zhao, K., Chu, WS., Zhang, H., Guo, J. (2020). Progressive Refinement Network for Occluded Pedestrian Detection. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12368. Springer, Cham. https://doi.org/10.1007/978-3-030-58592-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-58592-1_3
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58591-4
Online ISBN: 978-3-030-58592-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics