Skip to main content
Log in

Reparameterized dilated architecture: A wider field of view for pedestrian detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the continuous advancements in the field of computer vision, the performance of state-of-the-art (SOTA) methods in pedestrian detection has reached new heights. Despite this progress, challenges persist in constructing global information dependencies and context awareness due to limited receptive fields in most detectors. These constraints particularly affect edge and small pedestrian target detection. Our proposed solution, reparameterized dilated convolution (RDConv), strategically employs sawtooth dilation rates to broaden the receptive field without increasing computational costs. RDConv maintains the same cost as small convolutional kernels but offers a larger receptive field, enabling comprehensive modeling of the relationship between pedestrians and their environment, enhancing context awareness. To address the need for pedestrian information dependencies crucial for edge and small-target detection, we introduce the group multihead self-attention (G-MSA) mechanism. Overcoming high computational costs and limited interaction issues in traditional self-attention schemes, we adopt deep separation and supplementary boundary feature computation. RDConv and G-MSA are integrated into a multibranch framework to assess information flow interactions. To address the diverse requirements of activation functions for convolution and self-attention mechanisms, we propose the dynamic boundary (DB) activation function. It can adaptively adjust the nonlinearity and gradient of information from each layer in the network, accommodating the integrated structure of the two merging methods. Applied to YOLOv5s and tested on City Persons, Caltech Pedestrian, and PASCAL VOC datasets, our approach achieves significant metrics of 33.61 AP0.5, 61.41 AP0.5, and 92.08 mAP (YOLOv5m). Results across three datasets strongly affirm the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Han B, Wang YH, Yang Z, Gao XB (2020) Small-scale pedestrian detection based on deep neural network. IEEE Trans Intell Transp Syst 21(7):3046–3055. https://doi.org/10.1109/tits.2019.2923752

    Article  Google Scholar 

  2. Wei W, Cheng LD, Xia YX, Zhang PC, Gu JH, Liu XY (2019) Occluded pedestrian detection based on depth vision significance in biomimetic binocular. IEEE Sens J 19(23):11469–11474. https://doi.org/10.1109/jsen.2019.2929527

    Article  ADS  Google Scholar 

  3. Tian D, Han Y, Wang BY, Guan T, Wei W (2021) A review of intelligent driving pedestrian detection based on deep learning. Comput Intell Neurosci. https://doi.org/10.1155/2021/5410049

    Article  PubMed  PubMed Central  Google Scholar 

  4. Doric I, Reitberger A, Wittmann S, Harrison R, Brandmeier T (2017) A novel approach for the test of active pedestrian safety systems. IEEE Trans Intell Transp Syst 18(5):1299–1312. https://doi.org/10.1109/tits.2016.2606439

    Article  Google Scholar 

  5. Chen XW, Jia YP, Tong XQ, Li ZR (2022) Research on pedestrian detection and deepsort tracking in front of intelligent vehicle based on deep learning. Sustainability 14(15):9281. https://doi.org/10.3390/su14159281

    Article  Google Scholar 

  6. Li ZW, Liu F, Yang WJ, Peng SH, Zhou J (2022) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neur Netw Learn Syst 33(12):6999–7019. https://doi.org/10.1109/tnnls.2021.3084827

    Article  ADS  MathSciNet  Google Scholar 

  7. Arredondo-Velazquez M, Diaz-Carmona J, Barranco-Gutierrez AI, Torres-Huitzil C (2020) Review of prominent strategies for mapping CNNs onto embedded systems. IEEE Lat Am Trans 18(5):971–982. https://doi.org/10.1109/tla.2020.9082927

    Article  Google Scholar 

  8. Lee YY, Halim ZA (2020) Stochastic computing in convolutional neural network implementation: a review. Peerj Comput Sci. https://doi.org/10.7717/peerj-cs.309

    Article  PubMed  PubMed Central  Google Scholar 

  9. Chen Y, Jin ML, Liu HL, Wang B, Huang MY (2023) Small-scale pedestrian detection based on feature enhancement strategy. J Electron Inf Technol 45(4):1445–1453. https://doi.org/10.11999/jeit220122

    Article  Google Scholar 

  10. Xue P, Chen HJ, Li YF, Li JP (2023) Multi-scale pedestrian detection with global-local attention and multi-scale receptive field context. IET Comput Vision 17(1):13–25. https://doi.org/10.1049/cvi2.12125

    Article  Google Scholar 

  11. He YZ, He N, Yu HG, Zhang R, Yan K (2023) From macro to micro: rethinking multi-scale pedestrian detection. Multimed Syst 29(3):1417–1429. https://doi.org/10.1007/s00530-023-01058-1

    Article  Google Scholar 

  12. Wang MJ, Chen HJ, Li YF, You YH, Zhu JL (2021) Multi-scale pedestrian detection based on self-attention and adaptively spatial feature fusion. IET Intel Transport Syst 15(6):837–849. https://doi.org/10.1049/itr2.12066

    Article  Google Scholar 

  13. Zang Y, Cao RL, Li H, Hu WJ, Liu QS (2023) MAPD: multi-receptive field and attention mechanism for multispectral pedestrian detection. Visual Computer. https://doi.org/10.1007/s00371-023-02988-7

    Article  Google Scholar 

  14. Luo PF, Wang ZF (2019) Receptive Field Enrichment Network for Pedestrian Detection. Paper presented at the 2nd International Conference on Image and Video Processing, and Artificial Intelligence (IPVAI) (2019, Aug 23–25), Shanghai, PEOPLES R CHINA

  15. Shen C, Zhao XM, Fan X, Lian XY, Zhang F, Kreidieh AR, Liu ZW (2019) Multi-receptive field graph convolutional neural networks for pedestrian detection. IET Intel Transport Syst 13(9):1319–1328. https://doi.org/10.1049/iet-its.2018.5618

    Article  Google Scholar 

  16. Li GF, Ouyang DL, Chen X, Chu WB, Lu B, Zhang CZ, Guo G (2023) Pedestrian tracking based on receptive field improvement: a one-shot multiobject tracking approach based on vision sensors. IEEE Sens J 23(16):18893–18907. https://doi.org/10.1109/jsen.2023.3293519

    Article  ADS  Google Scholar 

  17. Wei HY, Zhang QQ, Han JJ, Fan YY, Qian YR (2022) SARNet: Spatial attention residual network for pedestrian and vehicle detection in large scenes. Appl Intell 52(15):17718–17733. https://doi.org/10.1007/s10489-022-03217-9

    Article  Google Scholar 

  18. Liu YH, Han CY, Zhang L, Gao X (2022) Pedestrian detection with multi-view convolution fusion algorithm. Entropy 24(2):165. https://doi.org/10.3390/e24020165

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  19. Wu QE, An ZM, Chen H, Qian XL, Sun LJ (2021) Small target recognition method on weak features. Multimed Tools Appl 80(3):4183–4201. https://doi.org/10.1007/s11042-020-09926-y

    Article  Google Scholar 

  20. Zhu YY, Huang H, Yu HY, Chen AR, Zhao GL (2023) CAPNet: Context and attribute perception for pedestrian detection. Electronics 12(8):1781. https://doi.org/10.3390/electronics12081781

    Article  Google Scholar 

  21. Li MJ, Chen S, Sun C, Fang S, Han JY, Wang XL, Yun HJ (2023) An improved lightweight dense pedestrian detection algorithm. Appl Sci-Basel 13(15):8757. https://doi.org/10.3390/app13158757

    Article  CAS  Google Scholar 

  22. Lin XC, Zhao CQ, Zhang C, Qian F (2022) Self-attention-guided scale-refined detector for pedestrian detection. Complex Intell Syst 8(6):4797–4809. https://doi.org/10.1007/s40747-022-00728-3

    Article  Google Scholar 

  23. Lu KW, Zhao FK, Xu XM, Zhang Y (2023) An object detection algorithm combining self-attention and YOLOv4 in traffic scene. Plos One 18(5):e0285654. https://doi.org/10.1371/journal.pone.0285654

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Jiang YY, Xie JY, Zhang D (2022) An adaptive offset activation function for CNN image classification tasks. Electronics 11(22):3799. https://doi.org/10.3390/electronics11223799

    Article  Google Scholar 

  25. Kiliçarslan S, Celik M (2021) RSigELU: A nonlinear activation function for deep neural networks. Exp Syst Appl 174:114805. https://doi.org/10.1016/j.eswa.2021.114805

    Article  Google Scholar 

  26. Iiduka H (2022) Appropriate learning rates of adaptive learning rate optimization algorithms for training deep neural networks. IEEE Trans Cybern 52(12):13250–13261. https://doi.org/10.1109/tcyb.2021.3107415

    Article  PubMed  Google Scholar 

  27. Chadha GS, Reimann, JN, Schwung A (2021) Generalized Dilation Structures in Convolutional Neural Networks. Paper presented at the 10th International Conference on Pattern Recognition Applications and Methods (ICPRAM) (2021, Feb 04–06), Electr Network

  28. Chan KH, Im SK, Ke W (2020) Ieee VGGreNet: A Light-Weight VGGNet with Reused Convolutional Set. Paper presented at the 13th IEEE/ACM International Conference on Utility and Cloud Computing (UCC) (2020, Dec 07–10), Electr Network

  29. Wang WA, Li SY, Shao JP, Jumahong H (2023) LKC-Net: large kernel convolution object detection network. Sci Rep 13(1):9535. https://doi.org/10.1038/s41598-023-36724-x

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  30. Hwang S, Han D, Jeon M (2023) Making depthwise convolution SR-friendly via kernel attention injection. J Vis Comm Image Represent 96:103930. https://doi.org/10.1016/j.jvcir.2023.103930

    Article  Google Scholar 

  31. Chen J, Liu R, Tong Y, Wu HL (2019) Synthetical application of multi-feature map detection and multi-branch convolution. EURASIP J Wirel Commun Netw. https://doi.org/10.1186/s13638-019-1444-y

    Article  Google Scholar 

  32. Que Y, Lee HJ (2022) Single image super-resolution via deep progressive multi-scale fusion networks. Neural Comput Appl 34(13):10707–10717. https://doi.org/10.1007/s00521-022-07006-w

    Article  Google Scholar 

  33. Li K, Wang D, Wang X, Liu G, Wu ZL, Wang Q (2023) Mixing self-attention and convolution: a unified framework for multisource remote sensing data classification. IEEE Trans Geosci Remote Sens 61:1. https://doi.org/10.1109/tgrs.2023.3310521

    Article  Google Scholar 

  34. Yan S, Shao HD, Wang J, Zheng XY, Liu B (2024) LiConvFormer: A lightweight fault diagnosis framework using separable multiscale convolution and broadcast self-attention. Expert Syst Appl 237:121338. https://doi.org/10.1016/j.eswa.2023.121338

    Article  Google Scholar 

  35. Wang J, Meng CC, Deng CZ, Wang YY (2023) Learning convolutional self-attention module for unmanned aerial vehicle tracking. SIViP 17(5):2323–2331. https://doi.org/10.1007/s11760-022-02449-z

    Article  Google Scholar 

  36. Dong YS, Shen LC, Pei YH, Yang HT, Li XL (2023) Field-matching attention network for object detection. Neurocomputing 535:123–133. https://doi.org/10.1016/j.neucom.2023.03.034

    Article  Google Scholar 

  37. Feng FX, Dong HL, Zhang YM, Zhang Y, Li B (2022) MS-ALN: Multiscale attention learning network for pest recognition. IEEE Access 10:40888–40898. https://doi.org/10.1109/access.2022.3167397

    Article  Google Scholar 

  38. Luo XD, Wu YQ, Zhao LY (2022) YOLOD: A target detection method for uav aerial imagery. Remote Sens 14(14):3240. https://doi.org/10.3390/rs14143240

    Article  ADS  Google Scholar 

  39. Zhu ZZ, Zhou YC, Dong Y, Zhong Z (2023) PWLU: Learning specialized activation functions with the piecewise linear unit. IEEE Trans Pattern Anal Mach Intell 45(10):12269–12286. https://doi.org/10.1109/tpami.2023.3286109

    Article  PubMed  Google Scholar 

  40. Bawa VS, Kumar V (2019) Linearized sigmoidal activation: a novel activation function with tractable non-linear characteristics to boost representation capability. Expert Syst Appl 120:346–356. https://doi.org/10.1016/j.eswa.2018.11.042

    Article  Google Scholar 

  41. Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11. https://doi.org/10.1016/j.neunet.2017.12.012

    Article  PubMed  Google Scholar 

  42. Lee MHY (2023) Mathematical analysis and performance evaluation of the GELU activation function in deep learning. J Math. https://doi.org/10.1155/2023/4229924

    Article  Google Scholar 

  43. Wang YY, Zhang ZZ, Wang YY, You LC, Wei G (2023) Modeling and structural optimization design of switched reluctance motor based on fusing attention mechanism and CNN-BiLSTM. Alex Eng J 80:229–240. https://doi.org/10.1016/j.aej.2023.08.039

    Article  Google Scholar 

  44. Deng LX, Li HQ, Liu HY, Gu J (2022) A lightweight YOLOv3 algorithm used for safety helmet detection. Sci Rep 12(1):10981. https://doi.org/10.1038/s41598-022-15272-w

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  45. Li P, Han TY, Ren YF, Xu P, Yu HL (2023) Improved YOLOv4-tiny based on attention mechanism for skin detection. Peerj Comput Sci 9:e1288. https://doi.org/10.7717/peerj-cs.1288

    Article  PubMed  PubMed Central  Google Scholar 

  46. Jiang TY, Li C, Yang M, Wang ZL (2022) An improved YOLOv5s algorithm for object detection with an attention mechanism. Electronics 11(16):2494. https://doi.org/10.3390/electronics11162494

    Article  Google Scholar 

  47. Liao SD, Huang CY, Liang Y, Zhang HQ, Liu SF (2022) Solder joint defect inspection method based on ConvNeXt-YOLOX. IEEE Trans Compon Packag Manuf Technol 12(11):1890–1898. https://doi.org/10.1109/tcpmt.2022.3224997

    Article  Google Scholar 

  48. Zhang Y, Sun YP, Wang Z, Jiang Y (2023) YOLOv7-RAR for urban vehicle detection. Sensors. 23(4):1801. https://doi.org/10.3390/s23041801

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  49. Ankalaki S, Thippeswamy MN (2023) A novel optimized parametric hyperbolic tangent swish activation function for 1D-CNN: application of sensor-based human activity recognition and anomaly detection. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-15766-3

    Article  Google Scholar 

  50. Khan AH, Munir M, van Elst L, Dengel A (2022) Leee. F2DNet: Fast Focal Detection Network for Pedestrian Detection. Paper presented at the 26th International Conference on Pattern Recognition / 8th International Workshop on Image Mining - Theory and Applications (IMTA) (2022, Aug 21–25), Montreal, CANADA

  51. Xu YQ, Zhou CL, Yu X, Xiao B, Yang Y (2021) Pyramidal multiple instance detection network with mask guided self-correction for weakly supervised object detection. IEEE Trans Image Process 30:3029–3040. https://doi.org/10.1109/tip.2021.3056887

    Article  ADS  PubMed  Google Scholar 

  52. Sun C, Ai YB, Qi X, Wang S, Zhang WD (2022) A single-shot model for traffic-related pedestrian detection. Pattern Anal Appl 25(4):853–865. https://doi.org/10.1007/s10044-022-01076-1

    Article  Google Scholar 

  53. Cao MY, Zhao J (2022) Fast efficientdet: an efficient pedestrian detection network. Eng Lett, 30(2)

  54. Tian Z, Shen CH, Chen H, He T (2022) FCOS: A simple and strong anchor-free object detector. IEEE Trans Pattern Anal Mach Intell 44(4):1922–1933. https://doi.org/10.1109/tpami.2020.3032166

    Article  PubMed  Google Scholar 

  55. Zhang W, Hou YQ, Fan WS, Yang X, Zhou DS, Zhang Q, Wei XP (2022) Perception-oriented single image super-resolution network with receptive field block. Neural Comput Appl 34(17):14845–14858. https://doi.org/10.1007/s00521-022-07341-y

    Article  Google Scholar 

  56. Li YS, Liu LZ, Lu TW (2023) SAE-CenterNet: Self-attention enhanced CenterNet for small dense object detection. Electr Lett 59(3):e212732. https://doi.org/10.1049/ell2.12732

    Article  Google Scholar 

  57. Liu SW, Cai TB, Tang XF, Zhang YY, Wang CG (2022) Visual recognition of traffic signs in natural scenes based on improved RetinaNet. Entropy 24(1):112. https://doi.org/10.3390/e24010112

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  58. Pan HG, Zhang HP, Lei XY, Xin FF, Wang Z (2022) Hybrid dilated faster RCNN for object detection. J Intell Fuzzy Syst 43(1):1229–1239. https://doi.org/10.3233/jifs-212740

    Article  Google Scholar 

  59. Cao LJ, Song PD, Wang YC, Yang Y, Peng BY (2023) An improved lightweight real-time detection algorithm based on the edge computing platform for UAV images. Electronics 12(10):2274. https://doi.org/10.3390/electronics12102274

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Lixiong Gong, Xiao Huang, Jialin Chen, Miaoling Xiao, and Yinkang Chao. Xiao Huang contributed equally to this work and should be considered co-first authors. The first draft of the manuscript was written by Xiao Huang and Lixiong Gong, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiao Huang.

Ethics declarations

Ethical and informed consent

This article does not contain any studies with human participants or animals performed by any of the authors. The datasets used in the manuscript are derived from publicly available data sets and may be obtained from the appropriate authors upon reasonable request.

Conflict of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gong, L., Huang, X., Chen, J. et al. Reparameterized dilated architecture: A wider field of view for pedestrian detection. Appl Intell 54, 1525–1544 (2024). https://doi.org/10.1007/s10489-023-05255-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05255-3

Keywords