Skip to main content
Log in

Multi-branch detection network based on trigger attention for pedestrian detection under occlusion

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Center and Scale Prediction (CSP) first introduced the Anchor-free method to the field of pedestrian detection. Pedestrian detection often occurs in complex scenes subject to occlusion, and it is difficult to extract pedestrian features in a single centre point prediction in CSP. To solve this problem, this paper presents a multi-branch detection network (MBDN) based on trigger attention. Firstly, a multi-centre point prediction branch feature extraction model (multi-centre) is proposed to solve the problem of CSP missed detections in occlusion scenarios. Secondly, a novel trigger attention module is designed. The module uses visible parts as triggers to automatically learn the weight relationships of multiple branches, let the network automatically learn the confidence of the centre points of different branches, and automatically strengthen the branch where the visible area on the feature map is located. Finally, a channel non-maximum suppression (NMS) module is used in the MBDN network to reduce the redundant bounding boxes. Then experiments results show that the log-average missing rate (MR−2) of the heavy subset is reduced from 49.63% to 45.51% while maintaining the performance on a reasonable subset. Code and models can be accessed at (https://github.com/weidalin/MBDN).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The datasets used or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Chu X, Zheng A, Zhang X, Sun J (2020) Detection in crowded scenes: one proposal, multiple predictions. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 12211-12220

  2. Zhang K, Xiong F, Sun P, Hu L, Li B, Yu G (2019) Double anchor r-cnn for human detection in a crowd. arXiv:1909.09998

  3. Xie H, Chen Y, Shin H (2018) Context-aware pedestrian detection especially for small-sized instances with deconvolution integrated faster rcnn (dif r-cnn). Appl Intell 49:1200–1211

    Article  Google Scholar 

  4. Sun C, Ai Y, Wang S, Zhang W (2021) Mask-guided ssd for small-object detection. Appl Intell 51:3311–3322

    Article  Google Scholar 

  5. Hasan I, Liao S, Li J, Akram S-U, Shao L (2020). Pedestrian detection: the elephant in the room. arXiv:2003.08799

  6. Zhou X-Y, Wang D-Q, Krähenbühl P (2019). Objects as points. arXiv:1904.07850

  7. Liu W, Hasan I, Liao S-C (2019) Center and scale prediction: a box-free approach for pedestrian and face detection. arXiv:1904.02948

  8. Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: a new perspective for pedestrian detection. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 5182-5191

  9. Girshick R (2015) Fast R-CNN. 2015 IEEE international conference on computer vision (ICCV), pp. 1440-1448

  10. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  11. Liu W, Angurelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A (2016) SSD: Single Shot MultiBox Detector. In: SSD: single shot MultiBox detector. European Conference on Computer Vision. Springer, Cham, pp 1–37

    Google Scholar 

  12. Lin T, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327

    Article  Google Scholar 

  13. Dai X, Yang X, Wei X (2021) Tirnet: object detection in thermal infrared images for autonomous driving. Appl Intell 51:1244–1261

    Article  Google Scholar 

  14. Law H, Deng J (2020) Cornernet: detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656

    Article  Google Scholar 

  15. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 840-849

  16. Tian Z, Shen C, Chen H and He T (2019) FCOS: fully convolutional one-stage object detection. 2019 IEEE/CVF international conference on computer vision (ICCV), pp. 9626-9635

  17. He Z-W, Ren Z-D, Yang X-B, Yang Y, Zhang W-S (2021) Mead: a mask-guided anchor-free detector for oriented aerial object detection. Applied intelligence, 1-16. https://doi.org/10.1007/s10489-021-02570-5

  18. Tang Z-Y, Yang J-B, Pei Z-C, Song X (2021) Coordinate-based anchor-free module for object detection. Applied intelligence, pp.1-15. https://doi.org/10.1007/s10489-021-02373-8

  19. Zhou X, Zhou J, Krähenbühl P (2019) Bottom-up object detection by grouping extreme and center points. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 850-859

  20. Wang X, Girshick R, Gupta A, He K (2018) Non-local Neural Networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7794–7803

  21. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023

    Article  Google Scholar 

  22. Woo S, Park J, Lee J-Y, Kweon I (2018) CBAM: Convolutional Block Attention Module. In: Cbam: convolutional block attention module. In European Conference on Computer Vision, Springer, Cham, pp 3–19

    Google Scholar 

  23. Lu E, Hu X (2021) Image super-resolution via channel attention and spatial attention. Appl Intell:1–9. https://doi.org/10.1007/s10489-021-02464-6

  24. Wang J, Yu J, He Z (2021) DECA: a novel multi-scale efficient channel attention module for object detection in real-life fire images. Appl Intell:1–14. https://doi.org/10.1007/s10489-021-02496-y

  25. Bodla N, Singh B, Chellappa R, Davis L (2017) Soft-NMS -- improving object detection with one line of code. In 2017 IEEE international conference on computer vision (ICCV), pp.5562-570

  26. Ning C, Zhou H, Yan S, Tang J (2017) Inception single shot MultiBox detector for object detection. In: 2017 IEEE international conference on Multimedia & Expo Workshops (ICMEW), pp 549–554

    Chapter  Google Scholar 

  27. Liu S, Huang D, Wang Y (2019) Adaptive NMS: refining pedestrian detection in a crowd," 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 6452-6461, https://doi.org/10.1109/CVPR.2019.00662

  28. Huang X, Ge Z, Jie Z, Yoshie O (2020) NMS by representative region: towards crowded pedestrian detection by proposal pairing. 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp. 10747-10756, https://doi.org/10.1109/CVPR42600.2020.01076

  29. Zhang S, Benenson R, Schiele B (2017) CityPersons: a diverse dataset for pedestrian detection. 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp. 4457-4465. https://doi.org/10.1109/CVPR.2017.474

  30. Lin T-Y, Marie M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C (2014) Microsoft COCO: common objects in context. Springer International Publishing, European Conference on Computer Vision, pp 740–755

    Google Scholar 

  31. Dollar P, Wojek C, Schiele B, Perona P (2012) Pedestrian detection: an evaluation of the state of the art. IEEE Trans Pattern Anal Mach Intell 34(4):743–761. https://doi.org/10.1109/TPAMI.2011.155

    Article  Google Scholar 

  32. Braun M, Krebs S, Flohr F, Gavrila DM (2019) EuroCity persons: a novel benchmark for person detection in traffic scenes. IEEE Trans Pattern Anal Mach Intell 41(8):1844–1861. https://doi.org/10.1109/TPAMI.2019.2897684

    Article  Google Scholar 

  33. Wang W (2020) Adapted center and scale prediction: more stable and more accurate. arXiv:2002.09053

  34. Luo P, Ren J, Peng Z, Zhang R, Li J (2018) Differentiable learning-to-normalize via switchable normalization. arXiv:1806.10779

  35. Deng J, Dong W, Socher R, Li L, Li K, Li F-F (2009) ImageNet: a large-scale hierarchical image database. 2009 IEEE conference on computer vision and pattern recognition, pp. 248-255, https://doi.org/10.1109/CVPR.2009.5206848

  36. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp. 770-778, https://doi.org/10.1109/CVPR.2016.90

  37. Song T, Sun L, Xie D, Sun H, Pu S (2018) Small-scale pedestrian detection based on somatic topology localization and temporal feature aggregation. European conference on computer vision, pp. 554-569

  38. Wang X, Xiao T, Jiang Y, Shao S, Sun J, Shen C (2018) Repulsion loss: detecting pedestrians in a crowd. 2018 IEEE/CVF conference on computer vision and pattern recognition, pp. 7774-7783, https://doi.org/10.1109/CVPR.2018.00811

  39. Zhang S, Wen L, Bian X, Lei Z, Li S (2018) Occlusion-aware r-cnn: detecting pedestrians in a crowd. European Conference on Computer Vision, Springer, Cham, pp 657–674

    Google Scholar 

  40. Liu R, Ma H (2019) Semantic head enhanced pedestrian detection in a crowd. arXiv:1911.11985

  41. Liu W, Liao S, Hu W, Liang X, Chen X (2018) Learning efficient single-stage pedestrian detectors by asymptotic localization fitting. European conference on computer vision, springer, Cham. Springer, Cham, pp.643-659

  42. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767

  43. Zhang S, Wen L, Bian X, Lei Z, Li S Z (2018). Single-shot refinement neural network for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4203–4212

Download references

Acknowledgements

This work was sponsored in part by National Natural Science Foundation of Guangdong under Grant 2020A1515011409, in part by Key-Area Research and Development Program of Guangdong Province under Grant 2019B01015300, 2021B0101190003, in part by Special Project of Science and Technology Innovation Strategy of Guangdong Province under Grant 2021A1414030004, in part by Key Program of NSFC-Guangdong Joint Funds under Grant U1801263, U2001201, in part by Provincial Agricultural Science and TechnologyInnovation and Extension Project of Guangdong Province under Grant 2019KJ147, and in part by Guangdong Provincial Key Laboratory of Cyber-Physical System under Grant 2020B1212060069.

Author information

Authors and Affiliations

Authors

Contributions

Zhuowei Wang contributed to the conception of the study; Weida Lin performed the experiment; Weida Lin and Yang Wang contributed significantly to analysis and manuscript preparation; Weida Lin performed the data analyses and wrote the manuscript; Weida Lin, Zhuowei Wang, Lianglun Cheng, Xiaoyu Song helped perform the analysis with constructive discussions.

Corresponding author

Correspondence to Yang Wang.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent to publish

Not applicable.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Lin, W., Cheng, L. et al. Multi-branch detection network based on trigger attention for pedestrian detection under occlusion. Appl Intell 53, 6119–6132 (2023). https://doi.org/10.1007/s10489-022-03747-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03747-2

Keywords

Navigation