An improved one-stage pedestrian detection method based on multi-scale attention feature extraction

Ma, Jun; Wan, Honglin; Wang, Junxia; Xia, Hao; Bai, Chengjie

doi:10.1007/s11554-021-01074-2

An improved one-stage pedestrian detection method based on multi-scale attention feature extraction

Original Research Paper
Published: 22 January 2021

Volume 18, pages 1965–1978, (2021)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Jun Ma¹,
Honglin Wan¹,
Junxia Wang²,
Hao Xia¹ &
…
Chengjie Bai ORCID: orcid.org/0000-0003-3783-9419¹

900 Accesses
18 Citations
Explore all metrics

Abstract

In recent years, the performance of the convolutional neural network-based pedestrian detection method has improved significantly. However, an imbalance remains between detection accuracy and speed. In this paper, we employ a one-stage object detection framework and propose a pedestrian detection method based on the multi-scale attention mechanism of a convolutional neural network to improve the imbalance between accuracy and speed. First, a multi-scale convolution module is designed to extract corresponding features at different scales. Second, using the attention module, association information between features is mined from space and channel perspectives to strengthen the original features. Then, the enhanced features are passed through a classification and regression module to perform object positioning and bounding box regression. Finally, to learn more pedestrian location information, we improve the loss function to realise better network training. The proposed method achieved considerable results on the challenging CityPersons and Caltech pedestrian detection datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scale-Adaptive Deconvolutional Regression Network for Pedestrian Detection

A Hybrid Self-Attention Model for Pedestrians Detection

MAFA-net: pedestrian detection network based on multi-scale attention feature aggregation

Article 07 October 2021

References

Krizhevsky, A., Sutskever, I., Hinton, G. E.: ImageNet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp. 1097–1105 (2012)
Tian, Y., Luo, P., Wang, X., Tang, X.: Pedestrian detection aided by deep learning semantic tasks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5079–5087 (2015)
Zhang, L., Lin, L., Liang, X., He, K.: Is faster r-cnn doing well for pedestrian detection? In: European Conference on Computer Vision (ECCV), pp. 443–457 (2016)
Liu, W., Liao, S., Ren, W., Hu, W., Yu, Y.: High-level semantic feature detection: a new perspective for pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5187–5196 (2019)
Ma, J., Wan, H., Wang, J., Xia, H., Bai, C.: An improved scheme of deep dilated feature extraction on pedestrian detection. SIViP (2020). https://doi.org/10.1007/s11760-020-01742-z
Article Google Scholar
Zhang, S., Benenson, R., & Schiele, B.: CityPersonss: a diverse dataset for pedestrian detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2017)
Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: An evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)
Girshick R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788(2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., Berg, A. C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision (ECCV), pp. 21–37 (2016)
Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, A. C.: DSSD: deconvolutional single shot detector. arXiv:1701.06659 (2017)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4203–4212 (2018)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv:1804.02767 (2018)
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Fan, Q., Zhuo, W., Tang, C., Tai, Y.: Few-shot object detection with attention-RPN and multi-relation detector. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Wang, X., Zhang, S., Yu, Z., Feng, L., Zhang, W.: Scale-equalizing pyramid convolution for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). arXiv:2005.03101 (2020)
Bochkovskiy, A., Wang, C., Liao, H. M.: YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020).
Cai, Z., Fan, Q., Feris, R. S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision (ECCV), pp. 354–370 (2016)
Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154–6162 (2018)
Wang, X., Xiao, T., Jiang, Y., Shao, S., Sun, J., Shen, C.: Repulsion loss: detecting pedestrians in a crowd. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7774–7783 (2018)
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S. Z.: Occlusion-aware R-CNN: detecting pedestrians in a crowd. In: European Conference on Computer Vision (ECCV), pp. 637–653 (2018)
Wang, Z., Wang, J., Yang, Y.: Resisting the distracting-factors in pedestrian detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). arXiv:2005.07344 (2020)
Chu, X., Zheng, A., Zhang, X., Sun, J.: Detection in crowded scenes: one proposal, multiple predictions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). arXiv:2003.09163 (2020)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: NIPS, pp. 2017–2025 (2015)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. In: IEEE Trans. Pattern Anal. Mach. Intell., p. 1 (2019)
Fu, J., Liu, J., Tian, H., Li, Y., Bao, Y., Fang, Z., Lu, H.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3146–3154 (2019)
Zhu, M., Jiao, L., Liu, F., Yang, S., Wang, J.: Residual spectral-spatial attention network for hyperspectral image classification. In: IEEE Trans. Geosci. Remote Sensing, pp. 1–14 (2020)
Ji, R., Wen, L., Zhang, L., Du, D., Wu, Y., Zhao, C., Liu, X., Huang, F.: Attention convolutional binary neural tree for fine-grained visual categorization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). arXiv:1909.11378 (2020)
Li, A., Qi, J., Lu, H.: Multi-attention guided feature fusion network for salient object detection. Neurocomputing 416–427 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI (2017)
Liu, W., Liao, S., Hu, W., Liang, X., Chen, X.: Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. In: 2018 European Conference on Computer Vision (ECCV), pp. 618–634 (2018)
Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Lin, C.Y., Xie, H.X., Zheng, H.: PedJointNet: joint head-shoulder and full body deep network for pedestrian detection. IEEE Access 7, 47687–47697 (2019)
Article Google Scholar
Zhang, S., Yang, X., Liu, Y., Xu, C.: Asymmetric multi-stage CNNs for small-scale pedestrian detection. Neurocomputing 12–26 (2020)
Zhang, Y., Yi, P., Zhou, D., Yang, X., Zhang, Q., Wei, P.: CSANet: channel and spatial mixed attention CNN for pedestrian detection. IEEE Access 8, 76243–76252 (2020)
Article Google Scholar
Song, T., Sun, L., Xie, D., Sun, H., Pu, S.: Small-scale pedestrian detection based on topological line localization and temporal feature aggregation. In: 2018 European Conference on Computer Vision (ECCV), pp. 536–551 (2018)
Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: 2015 IEEE international conference on computer vision, pp. 1904–1912 (2015)
Li, Z., Chen, Z., Wu, Q.J., Liu, C.: Real-time pedestrian detection with deep supervision in the wild. SIViP 13(4), 761–769 (2019)
Article Google Scholar
Du, X., EI-Khamy, M., Morariu, V., Lee, J., Davis, L.: Fused deep neural networks for efficient pedestrian detection. arXiv:1805.08688 (2016)
Saeidi, M., Ahmadi, A.: High-performance and deep pedestrian detection based on estimation of different parts. J Supercomput (2020). https://doi.org/10.1007/s11227-020-03345-4
Article Google Scholar

Download references

Acknowledgements

This study is sponsored by the China Shandong Key R&D Plan (2018GGX106008), and is supported by the China Shandong Key Laboratory of Medical Physical Image Processing Technology.

Author information

Authors and Affiliations

School of Physics and Electronics, Shandong Normal University, Jinan, 250358, China
Jun Ma, Honglin Wan, Hao Xia & Chengjie Bai
School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
Junxia Wang

Authors

Jun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Honglin Wan
View author publications
You can also search for this author in PubMed Google Scholar
Junxia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Xia
View author publications
You can also search for this author in PubMed Google Scholar
Chengjie Bai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chengjie Bai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, J., Wan, H., Wang, J. et al. An improved one-stage pedestrian detection method based on multi-scale attention feature extraction. J Real-Time Image Proc 18, 1965–1978 (2021). https://doi.org/10.1007/s11554-021-01074-2

Download citation

Received: 18 August 2020
Accepted: 08 January 2021
Published: 22 January 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11554-021-01074-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved one-stage pedestrian detection method based on multi-scale attention feature extraction

Abstract

Access this article

Similar content being viewed by others

Scale-Adaptive Deconvolutional Regression Network for Pedestrian Detection

A Hybrid Self-Attention Model for Pedestrians Detection

MAFA-net: pedestrian detection network based on multi-scale attention feature aggregation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved one-stage pedestrian detection method based on multi-scale attention feature extraction

Abstract

Access this article

Similar content being viewed by others

Scale-Adaptive Deconvolutional Regression Network for Pedestrian Detection

A Hybrid Self-Attention Model for Pedestrians Detection

MAFA-net: pedestrian detection network based on multi-scale attention feature aggregation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation