GSA-DLA34: a novel anchor-free method for human-vehicle detection

Chen, Xinying; Lv, Na; Lv, Shuo; Zhang, Hao

doi:10.1007/s10489-023-04788-x

GSA-DLA34: a novel anchor-free method for human-vehicle detection

Published: 27 July 2023

Volume 53, pages 24619–24637, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xinying Chen¹,
Na Lv ORCID: orcid.org/0000-0001-8456-4193¹,
Shuo Lv¹ &
…
Hao Zhang¹

146 Accesses
Explore all metrics

Abstract

Most anchor-free object detectors suffer from intersample imbalance, underutilization of multiscale features and long training times in traffic object dataset. As a result, the efficiency and accuracy of the detector may be significantly reduced for samples with few categories and small sizes. To address these problems, we propose a novel anchor-free approach, namely, GSA-DLA34, which is based on Gaussian kernel, sample weights, and attention. Its features are as follows. First, pyramid squeeze attention (PSA) is added after the backbone network to enhance multiscale traffic object representations. Second, for better object positioning with few categories and small scales, we design active sample weights for regression loss to make better information use. In addition, an elliptical Gaussian sampling module (EGSM) with a controllable Gaussian kernel shape is incorporated into the classification and regression branches to accelerate network training. The results show that our GSA-DLA34 has a significant advantage in balancing training time, inference speed, and accuracy. With an average precision of 89% on the PASCAL VOC dataset and an inference speed of 55.2 FPS on the RTX 2080 Ti, the GSA-DLA34 method can significantly improve human-vehicle recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

EfficientLiteDet: a real-time pedestrian and vehicle detection algorithm

Article 12 April 2022

Multi-target vehicle detection based on corner pooling with attention mechanism

Article 21 October 2023

SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes

Article 04 April 2022

References

Wang X, Zheng X, Chen W, Wang F (2021) Visual human-computer interactions for intelligent vehicles and intelligent transportation systems: The state of the art and future directions. IEEE Trans Syst Man Cybern Syst 51(1):253–265. https://doi.org/10.1109/TSMC.2020.3040262
Article Google Scholar
Boukerche A, Zhijun H (2021) Object detection using deep learning methods in traffic scenarios. ACM Comput Surv 54(2):30–13035. https://doi.org/10.1145/3434398
Liu H, Nie H, Zhang Z, Li YF (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in humancomputer interaction. Neurocomputing 433:310–322. https://doi.org/10.1016/j.neucom.2020.09.068
Article Google Scholar
Hu B (2020) Object Detection for Automatic Driving Based on Deep Learning. In: 2020 International Conference on Computing and Data Science (CDS). IEEE, Stanford, CA, USA, pp 1–8. https://doi.org/10.1109/CDS49703.2020.00065
Liu H, Zhang C, Deng Y, Xie B, Liu T, Zhang Z, Li YF (2023) TransIFC: Invariant Cues-aware Feature Concentration Learning for Efficient Fine-grained Bird Image Classification. IEEE Transactions on Multimedia 1–14. https://doi.org/10.1109/TMM.2023.3238548
Liu T, Liu H, Yang B, Zhang Z (2023) LDCNet: Limb Direction Cues-aware Network for Flexible Human Pose Estimation in Industrial Behavioral Biometrics Systems. IEEE Trans Ind Inform 1–11. https://doi.org/10.1109/TII.2023.3266366
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY (2016) Berg AC SSD: Single Shot MultiBox Detector. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer Vision - ECCV 2016, vol. 9905. Springer, Cham, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, USA, pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Law H, Deng J (2020) Cornernet: Detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656
Article Google Scholar
Zhou X, Zhuo J, Krähenbühl P (2019) Bottom-up Object Detection by Grouping Extreme and Center Points. Preprint at arXiv:1901.08043v2
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Zhou J, Zhang B, Yuan X, Lian C, Ji L, Zhang Q, Yue J (2023) Yolocir: The network based on yolo and convnext for infrared object detection. Infrared Phys Technol 131:104703. https://doi.org/10.1016/j.infrared.2023.104703
Article Google Scholar
Kang Q, Zhao H, Yang D, Ahmed HS, Ma J (2020) Lightweight convolutional neural network for vehicle recognition in thermal infrared images. Infrared Phys Technol 104:103120. https://doi.org/10.1016/j.infrared.2019.103120
Article Google Scholar
Chen H, Cai W, Wu F, Liu Q (2021) Vehicle-mounted far-infrared pedestrian detection using multi-object tracking. Infrared Phys Technol 115:103697. https://doi.org/10.1016/j.infrared.2021.103697
Article Google Scholar
Sun H, Liu Y, Yuhan L (2023) A review of saliency object detection based on deep learning. Data Acquisition and Processing 38(01), 21–50. https://doi.org/10.16337/j.1004-9037.2023.01.002
Liu T, Wang J, Yang B, Wang X (2021) NGDNet: Nonuniform Gaussianlabel distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220. https://doi.org/10.1016/j.neucom.2020.12.090
Article Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, pp 379–387
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, pp. 779–788. https://doi.org/10.1109/CVPR.2016.91
Fu C, Liu W, Ranga A, Tyagi A, Berg A.C (2017) DSSD : Deconvolutional Single Shot Detector. Preprint at arXiv:1701.06659
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, USA, pp. 6517–6525. https://doi.org/10.1109/CVPR.2017.690
Lin TY, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Article Google Scholar
Xiao J (2021) exyolo: A small object detector based on yolov3 object detector. Proced Comput Sci 188:18–25. https://doi.org/10.1016/j.procs.2021.05.048
Article Google Scholar
Sharma V, Dhiman P, Rout RK (2023) Improved traffic sign recognition algorithm based on yolov4-tiny. J Vis Commun Image Rep 91:103774. https://doi.org/10.1016/j.jvcir.2023.103774
Article Google Scholar
Tian Z, Shen C, Chen H, He T(2019) FCOS: Fully Convolutional One-Stage Object Detection. Preprint at arXiv:1904.01355
Liu Z, Zheng T, Xu G, Yang Z, Liu H, Cai D (2020) Training-timefriendly network for real-time object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34. AAAI Press, Palo Alto, pp. 11685–11692. https://doi.org/10.1609/aaai.v34i07.6838
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372
Article Google Scholar
Yu Z, Shi X, Zhang Z (2023) A multi-head self-attention transformer-based model for traffic situation prediction in terminal areas. IEEE Access 11:16156–16165. https://doi.org/10.1109/ACCESS.2023.3245085
Article Google Scholar
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y. (eds.) Computer Vision - ECCV 2018, vol. 11211. Springer, Cham, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Singleshot object detection with enriched semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE Computer Society, Salt Lake City, UT, USA, pp 5813–5821. https://doi.org/10.1109/CVPR.2018.00609
Zhang H, Zu K, Lu J, Zou Y, Meng D (2023) Epsanet: An efficient pyramid squeeze attention block on convolutional neural network. In: Wang L, Gall J, Chin TJ, Sato I, Chellappa R (eds.) Computer Vision - ACCV 2022, vol. 13843. Springer, Cham, pp 541–557. https://doi.org/10.1007/978-3-031-26313-2_33
Cao K, Wei C, Gaidon A, Arechiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R (eds.) Proceedings of the 33rd International Conference on Neural Information Processing Systems, vol. 32. Curran Associates Inc., Red Hook, NY, USA, pp 1565–1576
Cui Y, Jia M, Lin TY, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, pp 9260–9269. https://doi.org/10.1109/CVPR.2019.00949
Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attributeguided feature learning network for vehicle reidentification. IEEE MultiMed 27(4):112–121. https://doi.org/10.1109/MMUL.2020.2999464
Article Google Scholar
Fan S, Zhu F, Chen S, Zhang H, Tian B, Lv Y, Wang FY (2021) FIICenterNet: an anchor-free detector with foreground attention for traffic object detection. IEEE Trans Veh Technol 70:121–132. https://doi.org/10.1109/TVT.2021.3049805
Wang H, Peng J, Zhao Y, Fu X (2020) Multi-path deep cnns for fine-grained car recognition. IEEE Trans Veh Technol 69(10):10484–10493. https://doi.org/10.1109/TVT.2020.3009162
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, pp 2403–2412. https://doi.org/10.1109/CVPR.2018.00255
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, pp 9300–9308. https://doi.org/10.1109/CVPR.2019.00953
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10, (ed) Fürnkranz J, Joachims T. Omnipress, Haifa, Israel, pp 807–814
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-IoU Loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34. AAAI Press, Palo Alto, pp 12993–13000. https://doi.org/10.1609/aaai.v34i07.6999
Everingham M, Gool LV, Williams CKI, Winn JM, Zisserman A (2010) The Pascal Visual Object Classes (VOC) Challenge. figshare https://doi.org/10.1007/s11263-009-0275-4
Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X, Sun S, Feng W, Liu Z, Xu J, et al. (2019) MMDetection: Open mmlab detection toolbox and benchmark. Preprint at arXiv:1906.07155
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. figshare https://doi.org/10.1109/CVPR.2009.5206848
Jais IKM, Ismail AR, Nisa SQ (2019) Adam optimization algorithm for wide and deep neural network. Knowl Eng Data Sci 2(1), 41–56. https://doi.org/10.17977/um018v2i12019p41-46
Girshick R (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Article Google Scholar

Download references

Acknowledgements

This work is supported by the Liaoning Provincial Science and Technology Department (No.1655706734383).

Author information

Authors and Affiliations

School of Computer and Communication Engineering, Dalian Jiaotong University, Dalian, 116028, China
Xinying Chen, Na Lv, Shuo Lv & Hao Zhang

Authors

Xinying Chen
View author publications
You can also search for this author in PubMed Google Scholar
Na Lv
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Lv
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Na Lv.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, X., Lv, N., Lv, S. et al. GSA-DLA34: a novel anchor-free method for human-vehicle detection. Appl Intell 53, 24619–24637 (2023). https://doi.org/10.1007/s10489-023-04788-x

Download citation

Accepted: 11 June 2023
Published: 27 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04788-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GSA-DLA34: a novel anchor-free method for human-vehicle detection

Abstract

Access this article

Similar content being viewed by others

EfficientLiteDet: a real-time pedestrian and vehicle detection algorithm

Multi-target vehicle detection based on corner pooling with attention mechanism

SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GSA-DLA34: a novel anchor-free method for human-vehicle detection

Abstract

Access this article

Similar content being viewed by others

EfficientLiteDet: a real-time pedestrian and vehicle detection algorithm

Multi-target vehicle detection based on corner pooling with attention mechanism

SARNet: Spatial Attention Residual Network for pedestrian and vehicle detection in large scenes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation