Reparameterized dilated architecture: A wider field of view for pedestrian detection

Gong, Lixiong; Huang, Xiao; Chen, Jialin; Xiao, Miaoling; Chao, Yinkang

doi:10.1007/s10489-023-05255-3

Reparameterized dilated architecture: A wider field of view for pedestrian detection

Published: 09 January 2024

Volume 54, pages 1525–1544, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Lixiong Gong^1,2^na1,
Xiao Huang ORCID: orcid.org/0009-0007-9490-8595^1,2^na1,
Jialin Chen¹,
Miaoling Xiao¹ &
…
Yinkang Chao¹

395 Accesses
Explore all metrics

Abstract

With the continuous advancements in the field of computer vision, the performance of state-of-the-art (SOTA) methods in pedestrian detection has reached new heights. Despite this progress, challenges persist in constructing global information dependencies and context awareness due to limited receptive fields in most detectors. These constraints particularly affect edge and small pedestrian target detection. Our proposed solution, reparameterized dilated convolution (RDConv), strategically employs sawtooth dilation rates to broaden the receptive field without increasing computational costs. RDConv maintains the same cost as small convolutional kernels but offers a larger receptive field, enabling comprehensive modeling of the relationship between pedestrians and their environment, enhancing context awareness. To address the need for pedestrian information dependencies crucial for edge and small-target detection, we introduce the group multihead self-attention (G-MSA) mechanism. Overcoming high computational costs and limited interaction issues in traditional self-attention schemes, we adopt deep separation and supplementary boundary feature computation. RDConv and G-MSA are integrated into a multibranch framework to assess information flow interactions. To address the diverse requirements of activation functions for convolution and self-attention mechanisms, we propose the dynamic boundary (DB) activation function. It can adaptively adjust the nonlinearity and gradient of information from each layer in the network, accommodating the integrated structure of the two merging methods. Applied to YOLOv5s and tested on City Persons, Caltech Pedestrian, and PASCAL VOC datasets, our approach achieves significant metrics of 33.61 AP_0.5, 61.41 AP_0.5, and 92.08 mAP (YOLOv5m). Results across three datasets strongly affirm the effectiveness of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

An advanced lightweight network with stepwise multiscale fusion in crowded scenes

Article 02 September 2024

An improved Multi-Scale Fusion and Small Object Enhancement method for efficient pedestrian detection in dense scenes

Article 12 March 2025

From macro to micro: rethinking multi-scale pedestrian detection

Article 01 March 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Han B, Wang YH, Yang Z, Gao XB (2020) Small-scale pedestrian detection based on deep neural network. IEEE Trans Intell Transp Syst 21(7):3046–3055. https://doi.org/10.1109/tits.2019.2923752
Article Google Scholar
Wei W, Cheng LD, Xia YX, Zhang PC, Gu JH, Liu XY (2019) Occluded pedestrian detection based on depth vision significance in biomimetic binocular. IEEE Sens J 19(23):11469–11474. https://doi.org/10.1109/jsen.2019.2929527
Article ADS Google Scholar
Tian D, Han Y, Wang BY, Guan T, Wei W (2021) A review of intelligent driving pedestrian detection based on deep learning. Comput Intell Neurosci. https://doi.org/10.1155/2021/5410049
Article PubMed PubMed Central Google Scholar
Doric I, Reitberger A, Wittmann S, Harrison R, Brandmeier T (2017) A novel approach for the test of active pedestrian safety systems. IEEE Trans Intell Transp Syst 18(5):1299–1312. https://doi.org/10.1109/tits.2016.2606439
Article Google Scholar
Chen XW, Jia YP, Tong XQ, Li ZR (2022) Research on pedestrian detection and deepsort tracking in front of intelligent vehicle based on deep learning. Sustainability 14(15):9281. https://doi.org/10.3390/su14159281
Article Google Scholar
Li ZW, Liu F, Yang WJ, Peng SH, Zhou J (2022) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neur Netw Learn Syst 33(12):6999–7019. https://doi.org/10.1109/tnnls.2021.3084827
Article ADS MathSciNet Google Scholar
Arredondo-Velazquez M, Diaz-Carmona J, Barranco-Gutierrez AI, Torres-Huitzil C (2020) Review of prominent strategies for mapping CNNs onto embedded systems. IEEE Lat Am Trans 18(5):971–982. https://doi.org/10.1109/tla.2020.9082927
Article Google Scholar
Lee YY, Halim ZA (2020) Stochastic computing in convolutional neural network implementation: a review. Peerj Comput Sci. https://doi.org/10.7717/peerj-cs.309
Article PubMed PubMed Central Google Scholar
Chen Y, Jin ML, Liu HL, Wang B, Huang MY (2023) Small-scale pedestrian detection based on feature enhancement strategy. J Electron Inf Technol 45(4):1445–1453. https://doi.org/10.11999/jeit220122
Article Google Scholar
Xue P, Chen HJ, Li YF, Li JP (2023) Multi-scale pedestrian detection with global-local attention and multi-scale receptive field context. IET Comput Vision 17(1):13–25. https://doi.org/10.1049/cvi2.12125
Article Google Scholar
He YZ, He N, Yu HG, Zhang R, Yan K (2023) From macro to micro: rethinking multi-scale pedestrian detection. Multimed Syst 29(3):1417–1429. https://doi.org/10.1007/s00530-023-01058-1
Article Google Scholar
Wang MJ, Chen HJ, Li YF, You YH, Zhu JL (2021) Multi-scale pedestrian detection based on self-attention and adaptively spatial feature fusion. IET Intel Transport Syst 15(6):837–849. https://doi.org/10.1049/itr2.12066
Article Google Scholar
Zang Y, Cao RL, Li H, Hu WJ, Liu QS (2023) MAPD: multi-receptive field and attention mechanism for multispectral pedestrian detection. Visual Computer. https://doi.org/10.1007/s00371-023-02988-7
Article Google Scholar
Luo PF, Wang ZF (2019) Receptive Field Enrichment Network for Pedestrian Detection. Paper presented at the 2nd International Conference on Image and Video Processing, and Artificial Intelligence (IPVAI) (2019, Aug 23–25), Shanghai, PEOPLES R CHINA
Shen C, Zhao XM, Fan X, Lian XY, Zhang F, Kreidieh AR, Liu ZW (2019) Multi-receptive field graph convolutional neural networks for pedestrian detection. IET Intel Transport Syst 13(9):1319–1328. https://doi.org/10.1049/iet-its.2018.5618
Article Google Scholar
Li GF, Ouyang DL, Chen X, Chu WB, Lu B, Zhang CZ, Guo G (2023) Pedestrian tracking based on receptive field improvement: a one-shot multiobject tracking approach based on vision sensors. IEEE Sens J 23(16):18893–18907. https://doi.org/10.1109/jsen.2023.3293519
Article ADS Google Scholar
Wei HY, Zhang QQ, Han JJ, Fan YY, Qian YR (2022) SARNet: Spatial attention residual network for pedestrian and vehicle detection in large scenes. Appl Intell 52(15):17718–17733. https://doi.org/10.1007/s10489-022-03217-9
Article Google Scholar
Liu YH, Han CY, Zhang L, Gao X (2022) Pedestrian detection with multi-view convolution fusion algorithm. Entropy 24(2):165. https://doi.org/10.3390/e24020165
Article ADS PubMed PubMed Central Google Scholar
Wu QE, An ZM, Chen H, Qian XL, Sun LJ (2021) Small target recognition method on weak features. Multimed Tools Appl 80(3):4183–4201. https://doi.org/10.1007/s11042-020-09926-y
Article Google Scholar
Zhu YY, Huang H, Yu HY, Chen AR, Zhao GL (2023) CAPNet: Context and attribute perception for pedestrian detection. Electronics 12(8):1781. https://doi.org/10.3390/electronics12081781
Article Google Scholar
Li MJ, Chen S, Sun C, Fang S, Han JY, Wang XL, Yun HJ (2023) An improved lightweight dense pedestrian detection algorithm. Appl Sci-Basel 13(15):8757. https://doi.org/10.3390/app13158757
Article CAS Google Scholar
Lin XC, Zhao CQ, Zhang C, Qian F (2022) Self-attention-guided scale-refined detector for pedestrian detection. Complex Intell Syst 8(6):4797–4809. https://doi.org/10.1007/s40747-022-00728-3
Article Google Scholar
Lu KW, Zhao FK, Xu XM, Zhang Y (2023) An object detection algorithm combining self-attention and YOLOv4 in traffic scene. Plos One 18(5):e0285654. https://doi.org/10.1371/journal.pone.0285654
Article CAS PubMed PubMed Central Google Scholar
Jiang YY, Xie JY, Zhang D (2022) An adaptive offset activation function for CNN image classification tasks. Electronics 11(22):3799. https://doi.org/10.3390/electronics11223799
Article Google Scholar
Kiliçarslan S, Celik M (2021) RSigELU: A nonlinear activation function for deep neural networks. Exp Syst Appl 174:114805. https://doi.org/10.1016/j.eswa.2021.114805
Article Google Scholar
Iiduka H (2022) Appropriate learning rates of adaptive learning rate optimization algorithms for training deep neural networks. IEEE Trans Cybern 52(12):13250–13261. https://doi.org/10.1109/tcyb.2021.3107415
Article PubMed Google Scholar
Chadha GS, Reimann, JN, Schwung A (2021) Generalized Dilation Structures in Convolutional Neural Networks. Paper presented at the 10th International Conference on Pattern Recognition Applications and Methods (ICPRAM) (2021, Feb 04–06), Electr Network
Chan KH, Im SK, Ke W (2020) Ieee VGGreNet: A Light-Weight VGGNet with Reused Convolutional Set. Paper presented at the 13th IEEE/ACM International Conference on Utility and Cloud Computing (UCC) (2020, Dec 07–10), Electr Network
Wang WA, Li SY, Shao JP, Jumahong H (2023) LKC-Net: large kernel convolution object detection network. Sci Rep 13(1):9535. https://doi.org/10.1038/s41598-023-36724-x
Article ADS CAS PubMed PubMed Central Google Scholar
Hwang S, Han D, Jeon M (2023) Making depthwise convolution SR-friendly via kernel attention injection. J Vis Comm Image Represent 96:103930. https://doi.org/10.1016/j.jvcir.2023.103930
Article Google Scholar
Chen J, Liu R, Tong Y, Wu HL (2019) Synthetical application of multi-feature map detection and multi-branch convolution. EURASIP J Wirel Commun Netw. https://doi.org/10.1186/s13638-019-1444-y
Article Google Scholar
Que Y, Lee HJ (2022) Single image super-resolution via deep progressive multi-scale fusion networks. Neural Comput Appl 34(13):10707–10717. https://doi.org/10.1007/s00521-022-07006-w
Article Google Scholar
Li K, Wang D, Wang X, Liu G, Wu ZL, Wang Q (2023) Mixing self-attention and convolution: a unified framework for multisource remote sensing data classification. IEEE Trans Geosci Remote Sens 61:1. https://doi.org/10.1109/tgrs.2023.3310521
Article Google Scholar
Yan S, Shao HD, Wang J, Zheng XY, Liu B (2024) LiConvFormer: A lightweight fault diagnosis framework using separable multiscale convolution and broadcast self-attention. Expert Syst Appl 237:121338. https://doi.org/10.1016/j.eswa.2023.121338
Article Google Scholar
Wang J, Meng CC, Deng CZ, Wang YY (2023) Learning convolutional self-attention module for unmanned aerial vehicle tracking. SIViP 17(5):2323–2331. https://doi.org/10.1007/s11760-022-02449-z
Article Google Scholar
Dong YS, Shen LC, Pei YH, Yang HT, Li XL (2023) Field-matching attention network for object detection. Neurocomputing 535:123–133. https://doi.org/10.1016/j.neucom.2023.03.034
Article Google Scholar
Feng FX, Dong HL, Zhang YM, Zhang Y, Li B (2022) MS-ALN: Multiscale attention learning network for pest recognition. IEEE Access 10:40888–40898. https://doi.org/10.1109/access.2022.3167397
Article Google Scholar
Luo XD, Wu YQ, Zhao LY (2022) YOLOD: A target detection method for uav aerial imagery. Remote Sens 14(14):3240. https://doi.org/10.3390/rs14143240
Article ADS Google Scholar
Zhu ZZ, Zhou YC, Dong Y, Zhong Z (2023) PWLU: Learning specialized activation functions with the piecewise linear unit. IEEE Trans Pattern Anal Mach Intell 45(10):12269–12286. https://doi.org/10.1109/tpami.2023.3286109
Article PubMed Google Scholar
Bawa VS, Kumar V (2019) Linearized sigmoidal activation: a novel activation function with tractable non-linear characteristics to boost representation capability. Expert Syst Appl 120:346–356. https://doi.org/10.1016/j.eswa.2018.11.042
Article Google Scholar
Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11. https://doi.org/10.1016/j.neunet.2017.12.012
Article PubMed Google Scholar
Lee MHY (2023) Mathematical analysis and performance evaluation of the GELU activation function in deep learning. J Math. https://doi.org/10.1155/2023/4229924
Article Google Scholar
Wang YY, Zhang ZZ, Wang YY, You LC, Wei G (2023) Modeling and structural optimization design of switched reluctance motor based on fusing attention mechanism and CNN-BiLSTM. Alex Eng J 80:229–240. https://doi.org/10.1016/j.aej.2023.08.039
Article Google Scholar
Deng LX, Li HQ, Liu HY, Gu J (2022) A lightweight YOLOv3 algorithm used for safety helmet detection. Sci Rep 12(1):10981. https://doi.org/10.1038/s41598-022-15272-w
Article ADS CAS PubMed PubMed Central Google Scholar
Li P, Han TY, Ren YF, Xu P, Yu HL (2023) Improved YOLOv4-tiny based on attention mechanism for skin detection. Peerj Comput Sci 9:e1288. https://doi.org/10.7717/peerj-cs.1288
Article PubMed PubMed Central Google Scholar
Jiang TY, Li C, Yang M, Wang ZL (2022) An improved YOLOv5s algorithm for object detection with an attention mechanism. Electronics 11(16):2494. https://doi.org/10.3390/electronics11162494
Article Google Scholar
Liao SD, Huang CY, Liang Y, Zhang HQ, Liu SF (2022) Solder joint defect inspection method based on ConvNeXt-YOLOX. IEEE Trans Compon Packag Manuf Technol 12(11):1890–1898. https://doi.org/10.1109/tcpmt.2022.3224997
Article Google Scholar
Zhang Y, Sun YP, Wang Z, Jiang Y (2023) YOLOv7-RAR for urban vehicle detection. Sensors. 23(4):1801. https://doi.org/10.3390/s23041801
Article ADS PubMed PubMed Central Google Scholar
Ankalaki S, Thippeswamy MN (2023) A novel optimized parametric hyperbolic tangent swish activation function for 1D-CNN: application of sensor-based human activity recognition and anomaly detection. Multimed Tools Appl. https://doi.org/10.1007/s11042-023-15766-3
Article Google Scholar
Khan AH, Munir M, van Elst L, Dengel A (2022) Leee. F2DNet: Fast Focal Detection Network for Pedestrian Detection. Paper presented at the 26th International Conference on Pattern Recognition / 8th International Workshop on Image Mining - Theory and Applications (IMTA) (2022, Aug 21–25), Montreal, CANADA
Xu YQ, Zhou CL, Yu X, Xiao B, Yang Y (2021) Pyramidal multiple instance detection network with mask guided self-correction for weakly supervised object detection. IEEE Trans Image Process 30:3029–3040. https://doi.org/10.1109/tip.2021.3056887
Article ADS PubMed Google Scholar
Sun C, Ai YB, Qi X, Wang S, Zhang WD (2022) A single-shot model for traffic-related pedestrian detection. Pattern Anal Appl 25(4):853–865. https://doi.org/10.1007/s10044-022-01076-1
Article Google Scholar
Cao MY, Zhao J (2022) Fast efficientdet: an efficient pedestrian detection network. Eng Lett, 30(2)
Tian Z, Shen CH, Chen H, He T (2022) FCOS: A simple and strong anchor-free object detector. IEEE Trans Pattern Anal Mach Intell 44(4):1922–1933. https://doi.org/10.1109/tpami.2020.3032166
Article PubMed Google Scholar
Zhang W, Hou YQ, Fan WS, Yang X, Zhou DS, Zhang Q, Wei XP (2022) Perception-oriented single image super-resolution network with receptive field block. Neural Comput Appl 34(17):14845–14858. https://doi.org/10.1007/s00521-022-07341-y
Article Google Scholar
Li YS, Liu LZ, Lu TW (2023) SAE-CenterNet: Self-attention enhanced CenterNet for small dense object detection. Electr Lett 59(3):e212732. https://doi.org/10.1049/ell2.12732
Article Google Scholar
Liu SW, Cai TB, Tang XF, Zhang YY, Wang CG (2022) Visual recognition of traffic signs in natural scenes based on improved RetinaNet. Entropy 24(1):112. https://doi.org/10.3390/e24010112
Article ADS PubMed PubMed Central Google Scholar
Pan HG, Zhang HP, Lei XY, Xin FF, Wang Z (2022) Hybrid dilated faster RCNN for object detection. J Intell Fuzzy Syst 43(1):1229–1239. https://doi.org/10.3233/jifs-212740
Article Google Scholar
Cao LJ, Song PD, Wang YC, Yang Y, Peng BY (2023) An improved lightweight real-time detection algorithm based on the edge computing platform for UAV images. Electronics 12(10):2274. https://doi.org/10.3390/electronics12102274
Article Google Scholar

Download references

Author information

Lixiong Gong and Xiao Huang these authors have made equal contributions to the work and should be considered co-first authors.

Authors and Affiliations

School of Mechanical Engineering, Hubei University of Technology, Wuhan, China
Lixiong Gong, Xiao Huang, Jialin Chen, Miaoling Xiao & Yinkang Chao
Hubei Key Laboratory of Modern Manufacturing Quality Engineering, Wuhan, China
Lixiong Gong & Xiao Huang

Authors

Lixiong Gong
View author publications
You can also search for this author inPubMed Google Scholar
Xiao Huang
View author publications
You can also search for this author inPubMed Google Scholar
Jialin Chen
View author publications
You can also search for this author inPubMed Google Scholar
Miaoling Xiao
View author publications
You can also search for this author inPubMed Google Scholar
Yinkang Chao
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Lixiong Gong, Xiao Huang, Jialin Chen, Miaoling Xiao, and Yinkang Chao. Xiao Huang contributed equally to this work and should be considered co-first authors. The first draft of the manuscript was written by Xiao Huang and Lixiong Gong, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiao Huang.

Ethics declarations

Ethical and informed consent

This article does not contain any studies with human participants or animals performed by any of the authors. The datasets used in the manuscript are derived from publicly available data sets and may be obtained from the appropriate authors upon reasonable request.

Conflict of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gong, L., Huang, X., Chen, J. et al. Reparameterized dilated architecture: A wider field of view for pedestrian detection. Appl Intell 54, 1525–1544 (2024). https://doi.org/10.1007/s10489-023-05255-3

Download citation

Accepted: 23 December 2023
Published: 09 January 2024
Issue Date: January 2024
DOI: https://doi.org/10.1007/s10489-023-05255-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reparameterized dilated architecture: A wider field of view for pedestrian detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An advanced lightweight network with stepwise multiscale fusion in crowded scenes

An improved Multi-Scale Fusion and Small Object Enhancement method for efficient pedestrian detection in dense scenes

From macro to micro: rethinking multi-scale pedestrian detection

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical and informed consent

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now