Abstract
Most anchor-free object detectors suffer from intersample imbalance, underutilization of multiscale features and long training times in traffic object dataset. As a result, the efficiency and accuracy of the detector may be significantly reduced for samples with few categories and small sizes. To address these problems, we propose a novel anchor-free approach, namely, GSA-DLA34, which is based on Gaussian kernel, sample weights, and attention. Its features are as follows. First, pyramid squeeze attention (PSA) is added after the backbone network to enhance multiscale traffic object representations. Second, for better object positioning with few categories and small scales, we design active sample weights for regression loss to make better information use. In addition, an elliptical Gaussian sampling module (EGSM) with a controllable Gaussian kernel shape is incorporated into the classification and regression branches to accelerate network training. The results show that our GSA-DLA34 has a significant advantage in balancing training time, inference speed, and accuracy. With an average precision of 89% on the PASCAL VOC dataset and an inference speed of 55.2 FPS on the RTX 2080 Ti, the GSA-DLA34 method can significantly improve human-vehicle recognition accuracy.
Similar content being viewed by others
References
Wang X, Zheng X, Chen W, Wang F (2021) Visual human-computer interactions for intelligent vehicles and intelligent transportation systems: The state of the art and future directions. IEEE Trans Syst Man Cybern Syst 51(1):253–265. https://doi.org/10.1109/TSMC.2020.3040262
Boukerche A, Zhijun H (2021) Object detection using deep learning methods in traffic scenarios. ACM Comput Surv 54(2):30–13035. https://doi.org/10.1145/3434398
Liu H, Nie H, Zhang Z, Li YF (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in humancomputer interaction. Neurocomputing 433:310–322. https://doi.org/10.1016/j.neucom.2020.09.068
Hu B (2020) Object Detection for Automatic Driving Based on Deep Learning. In: 2020 International Conference on Computing and Data Science (CDS). IEEE, Stanford, CA, USA, pp 1–8. https://doi.org/10.1109/CDS49703.2020.00065
Liu H, Zhang C, Deng Y, Xie B, Liu T, Zhang Z, Li YF (2023) TransIFC: Invariant Cues-aware Feature Concentration Learning for Efficient Fine-grained Bird Image Classification. IEEE Transactions on Multimedia 1–14. https://doi.org/10.1109/TMM.2023.3238548
Liu T, Liu H, Yang B, Zhang Z (2023) LDCNet: Limb Direction Cues-aware Network for Flexible Human Pose Estimation in Industrial Behavioral Biometrics Systems. IEEE Trans Ind Inform 1–11. https://doi.org/10.1109/TII.2023.3266366
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY (2016) Berg AC SSD: Single Shot MultiBox Detector. In: Leibe B, Matas J, Sebe N, Welling M (eds.) Computer Vision - ECCV 2016, vol. 9905. Springer, Cham, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, USA, pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Law H, Deng J (2020) Cornernet: Detecting objects as paired keypoints. Int J Comput Vis 128(3):642–656
Zhou X, Zhuo J, Krähenbühl P (2019) Bottom-up Object Detection by Grouping Extreme and Center Points. Preprint at arXiv:1901.08043v2
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv:1904.07850
Zhou J, Zhang B, Yuan X, Lian C, Ji L, Zhang Q, Yue J (2023) Yolocir: The network based on yolo and convnext for infrared object detection. Infrared Phys Technol 131:104703. https://doi.org/10.1016/j.infrared.2023.104703
Kang Q, Zhao H, Yang D, Ahmed HS, Ma J (2020) Lightweight convolutional neural network for vehicle recognition in thermal infrared images. Infrared Phys Technol 104:103120. https://doi.org/10.1016/j.infrared.2019.103120
Chen H, Cai W, Wu F, Liu Q (2021) Vehicle-mounted far-infrared pedestrian detection using multi-object tracking. Infrared Phys Technol 115:103697. https://doi.org/10.1016/j.infrared.2021.103697
Sun H, Liu Y, Yuhan L (2023) A review of saliency object detection based on deep learning. Data Acquisition and Processing 38(01), 21–50. https://doi.org/10.16337/j.1004-9037.2023.01.002
Liu T, Wang J, Yang B, Wang X (2021) NGDNet: Nonuniform Gaussianlabel distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220. https://doi.org/10.1016/j.neucom.2020.12.090
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, USA, pp 379–387
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, pp. 779–788. https://doi.org/10.1109/CVPR.2016.91
Fu C, Liu W, Ranga A, Tyagi A, Berg A.C (2017) DSSD : Deconvolutional Single Shot Detector. Preprint at arXiv:1701.06659
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, USA, pp. 6517–6525. https://doi.org/10.1109/CVPR.2017.690
Lin TY, Goyal P, Girshick R, He K, Dollár P (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Xiao J (2021) exyolo: A small object detector based on yolov3 object detector. Proced Comput Sci 188:18–25. https://doi.org/10.1016/j.procs.2021.05.048
Sharma V, Dhiman P, Rout RK (2023) Improved traffic sign recognition algorithm based on yolov4-tiny. J Vis Commun Image Rep 91:103774. https://doi.org/10.1016/j.jvcir.2023.103774
Tian Z, Shen C, Chen H, He T(2019) FCOS: Fully Convolutional One-Stage Object Detection. Preprint at arXiv:1904.01355
Liu Z, Zheng T, Xu G, Yang Z, Liu H, Cai D (2020) Training-timefriendly network for real-time object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34. AAAI Press, Palo Alto, pp. 11685–11692. https://doi.org/10.1609/aaai.v34i07.6838
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372
Yu Z, Shi X, Zhang Z (2023) A multi-head self-attention transformer-based model for traffic situation prediction in terminal areas. IEEE Access 11:16156–16165. https://doi.org/10.1109/ACCESS.2023.3245085
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y. (eds.) Computer Vision - ECCV 2018, vol. 11211. Springer, Cham, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Singleshot object detection with enriched semantics. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE Computer Society, Salt Lake City, UT, USA, pp 5813–5821. https://doi.org/10.1109/CVPR.2018.00609
Zhang H, Zu K, Lu J, Zou Y, Meng D (2023) Epsanet: An efficient pyramid squeeze attention block on convolutional neural network. In: Wang L, Gall J, Chin TJ, Sato I, Chellappa R (eds.) Computer Vision - ACCV 2022, vol. 13843. Springer, Cham, pp 541–557. https://doi.org/10.1007/978-3-031-26313-2_33
Cao K, Wei C, Gaidon A, Arechiga N, Ma T (2019) Learning imbalanced datasets with label-distribution-aware margin loss. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R (eds.) Proceedings of the 33rd International Conference on Neural Information Processing Systems, vol. 32. Curran Associates Inc., Red Hook, NY, USA, pp 1565–1576
Cui Y, Jia M, Lin TY, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, pp 9260–9269. https://doi.org/10.1109/CVPR.2019.00949
Wang H, Peng J, Chen D, Jiang G, Zhao T, Fu X (2020) Attributeguided feature learning network for vehicle reidentification. IEEE MultiMed 27(4):112–121. https://doi.org/10.1109/MMUL.2020.2999464
Fan S, Zhu F, Chen S, Zhang H, Tian B, Lv Y, Wang FY (2021) FIICenterNet: an anchor-free detector with foreground attention for traffic object detection. IEEE Trans Veh Technol 70:121–132. https://doi.org/10.1109/TVT.2021.3049805
Wang H, Peng J, Zhao Y, Fu X (2020) Multi-path deep cnns for fine-grained car recognition. IEEE Trans Veh Technol 69(10):10484–10493. https://doi.org/10.1109/TVT.2020.3009162
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, pp 2403–2412. https://doi.org/10.1109/CVPR.2018.00255
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: More deformable, better results. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, pp 9300–9308. https://doi.org/10.1109/CVPR.2019.00953
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10, (ed) Fürnkranz J, Joachims T. Omnipress, Haifa, Israel, pp 807–814
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-IoU Loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34. AAAI Press, Palo Alto, pp 12993–13000. https://doi.org/10.1609/aaai.v34i07.6999
Everingham M, Gool LV, Williams CKI, Winn JM, Zisserman A (2010) The Pascal Visual Object Classes (VOC) Challenge. figshare https://doi.org/10.1007/s11263-009-0275-4
Chen K, Wang J, Pang J, Cao Y, Xiong Y, Li X, Sun S, Feng W, Liu Z, Xu J, et al. (2019) MMDetection: Open mmlab detection toolbox and benchmark. Preprint at arXiv:1906.07155
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. figshare https://doi.org/10.1109/CVPR.2009.5206848
Jais IKM, Ismail AR, Nisa SQ (2019) Adam optimization algorithm for wide and deep neural network. Knowl Eng Data Sci 2(1), 41–56. https://doi.org/10.17977/um018v2i12019p41-46
Girshick R (2015) Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Acknowledgements
This work is supported by the Liaoning Provincial Science and Technology Department (No.1655706734383).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, X., Lv, N., Lv, S. et al. GSA-DLA34: a novel anchor-free method for human-vehicle detection. Appl Intell 53, 24619–24637 (2023). https://doi.org/10.1007/s10489-023-04788-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04788-x