Skip to main content
Log in

SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the development of high-resolution camera technology, the shooting scene coverage has reached the square kilometer level, thousands of people can be observed at the same time, and the faces of people from a hundred meters away are clearly recognizable. The images captured by high-resolution cameras are very different from those captured by conventional cameras. In the face of many detection targets in high-resolution images, large differences in target scales due to spatial position, as well as difficulties in extracting features and poor detection results caused by target overlap and concealment phenomena, this paper proposes a multi-target detection method SARNet that combined with spatial attention optimization feature extraction. Use spatial attention to optimize the backbone network, expand the local receptive field, thereby enhance the representation ability, and enhance the feature extraction ability of small targets; the different scale features of the dilated feature pyramid network are subjected to the deformable region of interest pooling operation, which effectively improves the different scales detection accuracy. The experimental results show that the method proposed in this paper can get 51.9% mAP on the PANDA dataset, which is superior to the existing detection algorithms. At the same time, experimental verification of pedestrians and vehicles on the COCO2017 dataset fully proves the feasibility of the method in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Dollár P, Appel R, Belongie S, et al. (2014) Fast feature pyrTADNetids for object detection[J]. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Article  Google Scholar 

  2. NTADNet W, Dollár P, Han J. H. (2014) Local decorrelation for improved detection[J]. arXiv:1406.1134

  3. Zhang S, Benenson R, Schiele B. (2015) Filtered channel features for pedestrian detection[C]. CVPR 1(2):4

    Google Scholar 

  4. Dollár P, Tu Z, Perona P, et al. (2009) Integral channel features[J]

  5. Wang X, Xiao T, Jiang Y, et al. (2018) Repulsion loss: Detecting pedestrians in a crowd[C]// Proc IEEE Conf Comput Vis Pattern Recogn:7774–7783

  6. Cao X, Wu C, Yan P, Li X (2011) Linear SVM classification using boosting HOG features for vehicle detection in low-altitude airborne videos. In: proceedings of the 2011 IEEE international conference image processing(ICIP), Brussels, pp 2421– 2424

  7. Guo E., Bai L., Zhang Y, Han J (2017) Vehicle Detection Based on Superpixel and Improved HOG in Aerial Images. In: proceedings of the international conference on image and graphics, Shanghai, pp 362–373

  8. Laopracha N., Sunat K (2017) Comparative Study of Computational Time that HOG-based Features Used for Vehicle Detection. In: proceedings of the international conference on computing and information technology, Helsinki, pp 275–284

  9. Wang W., et al. (2019) Quantication of full left ventricular metrics via deep regression learning with contour-guidance. IEEE Access 7:47918–47928

    Article  Google Scholar 

  10. KTADNetilaris A, Prenafeta-boldú FX (2018) Deep learning in agriculture: A survey[J]. Comput Electron Agricul 147:70–90

    Article  Google Scholar 

  11. Zou Z, Shi Z, Guo Y, et al. (2019) Object detection in 20 years: A survey[J]. arXiv:1905.05055

  12. Jiao L, Zhang F, Liu F, et al. (2019) A survey of deep learning-based object detection[J]. IEEE Access 7:128837–128868

    Article  Google Scholar 

  13. Liu L, Ouyang W, Wang X, et al. (2020) Deep learning for generic object detection: A survey[J]. Int J Comput Vis 128(2):261–318

    Article  MATH  Google Scholar 

  14. Sang J, Wu Z, Guo P, et al. (2018) An improved YOLOv2 for vehicle detection[J]. Sensors 18(12):4272

    Article  Google Scholar 

  15. Redmon J., Farhadi A. (2018) YOLOV3: An incremental improvement, computer vision and pattern recognition (CVPR). IEEE, Salt Lake City), pp 126–134

  16. Liu W., et al. (2016) SSD: Single Shot multibox detector, European Conf. Computer Vision ECCV. Springer, ChTADNet, pp 21–37

  17. Lin T. Y., et al. (2017) Focal loss for dense object detection. In: Proc. IEEE Int. Conf. Computer Vision ICCV, Venice, pp 2980–2988

  18. Ren S., et al. (2015) Faster r-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  19. Dai J., et al. (2016) R-FCN: Object detection via region-based fully convolutional networks. advances in neural information processing systems (NIPS) (Barcelona), pp 379–387

  20. He K, et al., Gkioxari G (2017) Pdollár Mask r-CNN[c]. IEEE

  21. Zhang J., et al. (2020) A cascaded r-CNN with multiscale attention and imbalanced sTADNetples for traffic sign detection. IEEE Access 8:29742–29754

    Article  Google Scholar 

  22. Chen X, Gupta A. (2017) An implementation of faster rcnn with study for region sTADNetpling[J]. arXiv:1702.02138

  23. Shao S, Zhao Z, Li B, et al. (2018) Crowdhuman: A benchmark for detecting human in a crowd[J]. arXiv:1805.00123

  24. Wang M, et al., Chen H, Li Y (2021) Multi-scale pedestrian detection based on self-attention and adaptively spatial feature fusion[J]. IET Intelligent Transport Systems

  25. Panigrahi S, Raju U S N (2021) Pedestrian Detection Based on Hand-crafted Features and Multi-layer Feature Fused-ResNet Model[J]. Int J Artif Intell Tools

  26. Wanchaitanawong N, Tanaka M, Shibata T et al (2021) Multi-modal Pedestrian Detection with Large Misalignment Based on Modal-Wise Regression and Multi-Modal IoU[J]

  27. Li Q, Qiang H, Li J (2021) Conditional random fields as message passing mechanism in anchor-free network for multi-scale pedestrian detection[J]. Inf Sci 550:1–12

    Article  MathSciNet  Google Scholar 

  28. Chen W, Guo Y, Yang S et al (2021) Box Re-Ranking: Unsupervised False Positive Suppression for Domain Adaptive Pedestrian Detection[J]

  29. Jiao Y, Yao H, Xu C (2021) SAN: Selective alignment network for Cross-Domain pedestrian Detection[J]. IEEE Trans Image Processing

  30. Wang X, Xiao T, Jiang y et al (2018) Repulsion loss: Detecting pedestrians in a crowd[C]// Proceedings of the IEEE Conf Comput Vis Pattern Recognit, pp 7774–7783

  31. Zhao M, Zhong Y, Sun D, et al. (2021) Accurate and efficient vehicle detection framework based on SSD algorithm[J]. IET Image Processing

  32. Ghosh R (2021) On-road vehicle detection in varying weather conditions using faster r-CNN with several region proposal networks[J]. Multimed Tools Appl:1–15

  33. Wang B, Xu B (2021) A feature fusion deep-projection convolution neural network for vehicle detection in aerial images[J] PLOS One 16

  34. Bello I. et al (2019) Attention augmented convolutional networks. In: Proceedings IEEE Int Conf Comput Vis ICCV:3286–3295

  35. Hu J., Shen L., Sun G. (2018) Squeeze-and-excitation networks. In Proceedings of IEEE Conf. Computer Vision and Pattern Recognition CVPR. IEEE, Salt Lake City, pp 7132–7141

  36. Fan B B, Yang H. (2021) Multi-scale traffic sign detection model with attention[J]. Proc Inst Mech Eng Part D J Automobile Eng 235(2-3):708–720

    Article  Google Scholar 

  37. Liu F, Qian Y, Li H, et al. (2021) CAFFNet: Channel Attention and Feature Fusion Network for Multi-target Traffic Sign Detection[J]. Intern J Pattern Recognit Artif Intell

  38. Zhu X, Cheng D, Zhang Z et al (2019) An empirical study of spatial attention mechanisms in deep networks[C]// Proc IEEE/CVF Int Conf Comput Vis:6688–6697

  39. Xiao F, Liu B, Li R. (2020) Pedestrian object detection with fusion of visual attention mechanism and semantic computation[J]. Multimed Tools Appl 79(21):14593–14607

    Article  Google Scholar 

  40. Ma J, Wan H, Wang J, et al. (2021) An improved one-stage pedestrian detection method based on multi-scale attention feature extraction[J]. J Real-Time Image Proc:1–14

  41. Chen X, Liu L, Deng Y, et al. (2019) Vehicle detection based on visual attention mechanism and adaboost cascade classifier in intelligent transportation systems[J]. Opt Quant Electron 51(8): 1–18

    Article  Google Scholar 

  42. Dai J et al, Qi H, xiong Y (2017) Deformable convolutional Networks[C]// IEEE

  43. Dai Z, Yang Z, Yang Y et al (2019) Transformer-XL: Attentive Language Models beyond a Fixed-Length Context[J]

  44. Lin T Y et al, Dollar P, Girshick R (2017) Feature pyramid networks for object Detection[C]// 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society

  45. Yu F, Koltun V (2016) Multi-Scale context aggregation by dilated Convolutions[C]// ICLR

  46. Wang X, Zhang X, Zhu Y et al (2020) PANDA: A Gigapixel-level Human-centric Video Dataset[C]// arXiv. arXiv

  47. Lin T Y, Maire M, Belongie S et al (2014) Microsoft COCO: Common Objects in Context[J]. European Conf Comput Vis

  48. Zhu X, Cheng D, Zhang Z, et al. (2019) An empirical study of spatial attention mechanisms in deep networks[C]// Proc IEEE/CVF Int Conf Comput Vis:6688–6697

  49. Carion N, Massa F, Synnaeve G et al (2020) End-to-end Object Detection with Transformers[M]

  50. Pang J et al, Chen K, Shi J (2020) libra r-CNN: Towards balanced learning for object Detection[C]// 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE

  51. Wu Y, Chen Y (2020) Yuan L,othersRethinking Classification and Localization for Object Detection[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE

  52. Chen Q, Wang Y, Yang T et al (2021) You Only Look One-level Feature[J]

  53. Ge Z, Liu S, Wang F et al (2021) Yolox: Exceeding yolo series in 2021[J]. arXiv:2107.08430

Download references

Acknowledgements

This work was supported by the National Science Foundation of China under Grant U1803261. Funded by the National Natural Science Foundation of China (61562086). Funded by the National Natural Science Foundation of China (61966035). the Funds for Creative Research Groups of Higher Education of Xinjiang Uygur Autonomous Region under Grant No.XJEDU2017T002. Autonomous Region Graduate Innovation Project (XJ2019G072). Tianshan Innovation Team Plan Project of Xinjiang Uygur Autonomous Region under Grant No. 202101642.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yurong Qian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, H., Zhang, Q., Han, J. et al. SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes. Appl Intell 52, 17718–17733 (2022). https://doi.org/10.1007/s10489-022-03217-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03217-9

Keywords

Navigation