Abstract
During automatic driving, the complex background and mutual occlusion between multiple targets hinder the correct judgment of the detector and miss detection. When a close-range target is captured again, the vehicle may not be able to respond in time and cause a fatal accident. Therefore, in the application of auxiliary systems, a model that can accurately identify partially occluded targets in complex backgrounds and perform short-term tracking and early warning of completely occluded objects is required. This paper proposes a method to improve detection accuracy while supporting real-time operations based on YOLOv3 and realize real-time warnings for those objects that are completely blocked. First, we obtain a more suitable prior frames setting through class-wise K-means clustering. To solve the problem that the maxpool operation of original CBAM easily introduces background noise, we proposed AS-CBAM(Adaptive Selection Convolutional Block Attention Module) and innovatively combined the HDC(Hybrid Dilated Convolution) to maximize the receptive field and fine-tune the characteristics. The 1×1 convolution operation is used to suppress the increase of the parameter amount. In this study, DIOU-NMS was used to replace traditional NMS. Besides, a tracking algorithm based on Kalman filtering and Hungarian matching is introduced to improve the system’s ability to recognize occluded objects. Compared with the traditional YOLOv3, the proposed method can increase the mAP by 1.32% and 1.47% on KITTI and UA-DETRAC, respectively. Nevertheless, it shows a processing speed of 35.07FPS and a more significant improvement in accuracy (90.36% vs. 85.71%) on the Object-Mask, a dataset that focuses on occlusion conditions. Therefore, the proposed algorithm is more suitable for autonomous driving applications.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Mao Q, Sun H, Liu Y (2019) Mini-YOLOv3: real-time object detector for embedded applications. IEEE Access 7:133529–133538
Wu X, Chen H, Chen C, Zhong M, Xie S, Guo Y, Fujita H (2020) The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method. Knowl-Based Syst 105590
Gao P, Zhang Q, Wang F, Xiao L, Fujita H, Zhang Y (2020) Learning reinforced attentional representation for end-to-end visual tracking. arXiv preprint arXiv:1908.10009
Zhang Y, Zhou Y, Lu H, Fujita H (2020) Traffic network flow prediction using parallel training for deep convolutional neural networks on spark cloud. IEEE Transactions on Industrial Informatics. https://doi.org/10.1109/TII.2020.2976053
Bichen W, Iandola F, Jin Peter H, Keutzer K (2017) Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 129–137
Chi Z, Yuehu L, Danchen Z, Yuanqi S (2014) Road-view: A traffic scene simulator for autonomous vehicle simulation testing. In: 17th International IEEE conference on intelligent transportation systems (ITSC), IEEE, pp 1160–1165
Junqing W, Snider Jarrod M, Junsung K, Dolan John M, Rajkumar R, Litkouhi B (2013) Towards a viable autonomous driving research platform. In: 2013 IEEE intelligent vehicles symposium (IV), IEEE, pp 763–770
Gao T, Liu Z, Yue S, Zhang J (2010) Moving vehicle tracking algorithm used for intelligent traffic China. J Highway Transport 23(3):89–94
Teoh SS, Bräunl T (2012) Symmetry-based monocular vehicle detection system. Mach Vis Appl 23:831–842
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained multiscale deformable part model. In: IEEE conference on computer vision and pattern recognition (CVPR)
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained partbased models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Karaimer H, Baris BY (2017) Detection and classification of vehicles from omnidirectional videosusing multiple silhouettes. Pattern Anal Applic 20(3):893–905
Ershadi N, Menéndez J, Jiménez D (2018) Robust vehicle detection in different weather conditions: using MIPM. PLoS One 13:e0191355
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Shaoqing R, He K, Girshick R, Jian S (2015) r-cnn: Faster Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Jifeng D, Li Y, He K, Jian S (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proc IEEE international conference on computer vision (ICCV). pp 2961–2969
Wei L, Anguelov D, Erhan D, Szegedy C, Reed S, Cheng-Yang F, Alexander CB (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Zhaowei C, Quanfu F, Rogerio SF, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision, Springer, pp 354–370
Xiaowei H, Xuemiao X, Yongjie X, Hao C, He S, Jing Q, Pheng-Ann H (2019) Sinet: A scale-insensitive convolutional neural network for fast vehicle detection. IEEE Trans Intell Transp Syst 20(3):1010–1019
Qijie Z, Yongtao W, Tao T, Zhi T (2018) Comprehensive feature enhancement module for single-shot object detector. In: Asian conference on computer vision, Springer
Shifeng Z, Longyin W, Xiao B, Zhen L, Li Stan Z (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Songtao L, Di H et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400
Redmon J, Farhadi A (2017) YOLO9000: Better faster stronger. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7263–7271
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. https://arxiv.org/abs/1804.02767
Liu C, Huynh Q, Sun Y, Reynolds M, Atkinson S (2020) A vision-based pipeline for vehicle counting, speed estimation, and classification. IEEE Trans Intell Transp Syst, pp 1–14
Mao QC, Sun HM, Zuo LQ, et al. (2020) Finding every car: a traffic surveillance multi-scale vehicle object detection method. Appl Intell 50:3125–3136. https://doi.org/10.1007/s10489-020-01704-5
Harikrishnan PM, Thomas A, Gopi VP et al (2021) Inception single shot multi-box detector with affinity propagation clustering and their application in multi-class vehicle counting. Appl Intell. https://doi.org/10.1007/s10489-020-02127-y
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 3354–3361
Wei S, Chen H, Zhu X, Zhang H (2020) Ship detection in remote sensing image based on faster R-CNN with dilated convolution. In: 2020 39th Chinese Control Conference (CCC) Shenyang, China, pp 7148–7153. https://doi.org/10.23919/CCC50068.2020.9189467
Kim K, Kim P, Chung Y, Choi D (2019) Multi-Scale Detector for accurate vehicle detection in traffic surveillance data. IEEE Access 7:78311–78319. https://doi.org/10.1109/ACCESS.2019.2922479
Hong F, Lu C, Liu C, Liu R, Wei J (2020) A traffic surveillance Multi-Scale vehicle detection object method base on Encoder-Decoder. IEEE Access 8:47664–47674. https://doi.org/10.1109/ACCESS.2020.2979260
Zhao S, You F (2020) Vehicle detection based on improved yolov3 algorithm. In: 2020 international conference on intelligenttransportation, big data & smart city (ICITBS), vientian, Laos, pp 76–79. https://doi.org/10.1109/ICITBS49701.2020.00024
Yu F, Koltun V (2015) Multi-Scale context aggregation by dilated convolutions
Wandell BA, Winawer J (2015) Computational neuroimaging and population receptive fields[J]. Trends in Cognitive Sciences 19(6):349–357
Wu B, Iandola F, Jin Peter H, Keutzer K (2017) Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 129–137
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) CenterNet: Keypoint triplets for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), Seoul, Korea (South), pp 6568–6577. https://doi.org/10.1109/ICCV.2019.00667
Law H, Deng J (2020) Cornernet: Detecting objects as paired keypoints. Int J Comput Vis 128:642–656. https://doi.org/10.1007/s11263-019-01204-1
Zhu C, He Y, Savvides M (2019) Feature selective Anchor-Free module for Single-Shot object detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA, 840–849. https://doi.org/10.1109/CVPR.2019.00093
Acknowledgements
This work was supported by the National Natural Science Foundation of China [grant numbers No.U1733119]; the Central University basic scientific research business fee project of Civil Aviation University of China [grant numbers No.3122018C001].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, K., Liu, M. YOLOv3-MT: A YOLOv3 using multi-target tracking for vehicle visual detection. Appl Intell 52, 2070–2091 (2022). https://doi.org/10.1007/s10489-021-02491-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02491-3