Single-Stage Detector with Semantic Attention for Occluded Pedestrian Detection

Wen, Fang; Lin, Zehang; Yang, Zhenguo; Liu, Wenyin

doi:10.1007/978-3-030-05716-9_34

Fang Wen¹⁹,
Zehang Lin²⁰,
Zhenguo Yang^20,21 &
…
Wenyin Liu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11296))

Included in the following conference series:

International Conference on Multimedia Modeling

2378 Accesses

Abstract

In this paper, we propose a pedestrian detection method with semantic attention based on the single-stage detector architecture (i.e., RetinaNet) for occluded pedestrian detection, denoted as PDSA. PDSA contains a semantic segmentation component and a detector component. Specifically, the first component uses visible bounding boxes for semantic segmentation, aiming to obtain an attention map for pedestrians and the inter-class (non-pedestrian) occlusion. The second component utilizes the single-stage detector to locate the pedestrian from the features obtained previously. The single-stage detector adopts over-sampling of possible object locations, which is faster than two-stage detectors that train classifier to identify candidate object locations. In particular, we introduce the repulsion loss to deal with the intra-class occlusion. Extensive experiments conducted on the public CityPersons dataset demonstrate the effectiveness of PDSA for occluded pedestrian detection, which outperforms the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semantic Structural and Occlusive Feature Fusion for Pedestrian Detection

Self-attention-guided scale-refined detector for pedestrian detection

Article Open access 22 April 2022

Key points and visible part fusion attention network for occluded pedestrian detection in traffic environments

Article 30 May 2024

References

Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Computer Vision and Pattern Recognition (CVPR), pp. 2117–2125 (2017)
Google Scholar
Lin, T., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017)
Google Scholar
Girshick, R.: Fast R-CNN. In: Computer Vision and Pattern Recognition (CVPR), pp. 1440–1448 (2015)
Google Scholar
Zhang, S., Yang, J., Schiele, B.: Occluded pedestrian detection through guided attention in CNNs. In: Computer Vision and Pattern Recognition (CVPR), pp. 6995–7003 (2018)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, A.: DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017)
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: International Conference on Computer Vision (CVPR) (2018)
Google Scholar
Luo, P., Tian, Y., Wang, X., Tang, X.: Switchable deep network for pedestrian detection. In: Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Hosang, J., Omran, M., Benenson, R., Schiele, B.: Taking a deeper look at pedestrians. In: Computer Vision and Pattern Recognition (CVPR), pp. 4073–4082 (2015)
Google Scholar
Zhang, S., Benenson, R., Schiele, B.: Filtered channel features for pedestrian detection. In: Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Li, J., Liang, X., Shen, S., Xu, T., Yan, S.: Scale-aware fast R-CNN for pedestrian detection. IEEE Trans. Multimedia 20(4), 985–996 (2017)
Google Scholar
Cai, Z., Fan, Q., Feris, Rogerio S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 354–370. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_22
Chapter Google Scholar
Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_28
Chapter Google Scholar
Ouyang, W., Wang, X.: A discriminative deep model for pedestrian detection with occlusion handling. In: Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Mathias, M., Benenson, R., Timofte, R., Van, L.: Handling occlusions with Franken-classifiers. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1904–1912 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, vol. 60, pp. 1097–1105 (2012)
Google Scholar
Jiang, Y., Jiang, Y., Cao, Z., Cao, Z., Huang, T.: UnitBox: an advanced object detection network. In: ACM on Multimedia Conference, pp. 516–520 (2016)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Article Google Scholar
Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection. In: Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (ICAI), pp. 249–256 (2010)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Song, T., Sun, L., Xie, D., Sun, H., Pu, S.: Small-scale pedestrian detection based on somatic topology localization and temporal feature aggregation. arXiv preprint arXiv:1807.01438 (2018)

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No. 61703109, No. 91748107), China Postdoctoral Science Foundation (No. 2018M643026), and the Guangdong Innovative Research Team Program (No. 2014ZT05G157).

Author information

Authors and Affiliations

Department of Automation, Guangdong University of Technology, Guangzhou, China
Fang Wen
School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
Zehang Lin, Zhenguo Yang & Wenyin Liu
Department of Computer Science, City University of Hong Kong, Hong Kong, China
Zhenguo Yang

Authors

Fang Wen
View author publications
You can also search for this author in PubMed Google Scholar
Zehang Lin
View author publications
You can also search for this author in PubMed Google Scholar
Zhenguo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenyin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhenguo Yang or Wenyin Liu .

Editor information

Editors and Affiliations

Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Ioannis Kompatsiaris
EURECOM, Sophia Antipolis, France
Benoit Huet
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Vasileios Mezaris
Dublin City University, Dublin, Ireland
Cathal Gurrin
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, Greece
Stefanos Vrochidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wen, F., Lin, Z., Yang, Z., Liu, W. (2019). Single-Stage Detector with Semantic Attention for Occluded Pedestrian Detection. In: Kompatsiaris, I., Huet, B., Mezaris, V., Gurrin, C., Cheng, WH., Vrochidis, S. (eds) MultiMedia Modeling. MMM 2019. Lecture Notes in Computer Science(), vol 11296. Springer, Cham. https://doi.org/10.1007/978-3-030-05716-9_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-05716-9_34
Published: 11 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05715-2
Online ISBN: 978-3-030-05716-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics