Abstract
Vehicle detection in video frames has been treated the same way detecting vehicle for an isolated image. However, the models designed for the isolated image are blind to fast-moving vehicles and cannot localize the moving targets partially occluded in the scene. In this case, we figure out a way to combine the classic moving target detection method with the neural network method. In this work, first, we propose to add three-differential-frames into the neural network of Yolov3 as the second input which contains the motion information on the front and back frames to detect vehicles partially occluded; second, we reform the network by using Octave Convolution to reduce memory and computational cost while boosting accuracy. We experimentally show that by using the aforementioned methods together, compared with using original YOLOv3 on UA-DETRAC data set, AP is increased by 2.31%, recall is increased by 4.01%, and precision is increased by 3.10%. We demonstrate that the proposed method is indeed effective.


















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Chen, Y., Wu, Q.: Moving vehicle detection based on optical flow estimation of edge. In: 2015 11th International Conference on Natural Computation (ICNC), pp 754–758. IEEE (2015)
Teoh, S.S., Bräunl, T.: Symmetry-based monocular vehicle detection system. Mach Vis Appl 23(5), 831–842 (2012)
Tsai, L.W., Hsieh, J.W., Fan, K.C.: Vehicle detection using normalized color and edge map. IEEE Trans Image Process 16(3), 850–864 (2007)
Caiyuan, C., Xiaoning, Z.: Moving vehicle detection based on union of three-frame difference. In: Jin, D., Lin, S. (eds.) Advances in Electronic Engineering, Communication and Management, vol. 2, pp. 459–464. Springer, Berlin, Heidelberg (2012)
Sandeep, S.S., Susanta, M.: Moving object detection based on frame difference and W4. SIViP 11(7), 1357–1364 (2017)
He, H., Ma, S.C., Sun, L.: Multi-moving target detection based on the combination of three frame difference algorithm and background difference algorithm. In: 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA), pp. 141–146. IEEE, Beijing (2018)
Cui, X., Zhang, W., Liu, D.: Improved frame difference algorithm based on CNN for moving target detection. In: 39th Chinese Control Conference (CCC), pp. 7595–7598. IEEE, Shenyang (2020)
Alex, K., Ilya, S., Geoffrey, E.H.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 580–587 (2014)
Liu, W., et al.: SSD: Single Shot MultiBox detector. In: European Conference on Computer Vision, pp. 21–37. Springer, Cham (2016)
Redmon, J., & Farhadi, A.: Yolov3: an incremental improvement. arXiv:1804.02767 (2018)
Chen, W., Huang, H., Peng, S., et al.: YOLO-face: a real-time face detector. Vis. Comput. 37, 805–813 (2021). https://doi.org/10.1007/s00371-020-01831-7
Junos, M.H., Mohd Khairuddin, A.S., Thannirmalai, S., et al.: Automatic detection of oil palm fruits from UAV images using an improved YOLO model. Vis. Comput. 1, 15 (2021). https://doi.org/10.1007/s00371-021-02116-3
Zhang, H., Hu, Z., Hao, R.: Joint information fusion and multi-scale network model for pedestrian detection. Vis. Comput. 37, 2433–2442 (2021). https://doi.org/10.1007/s00371-020-01997-0
Harikrishnan, P.M., Thomas, A., Gopi, V.P., et al.: Inception single shot multi-box detector with affinity propagation clustering and their application in multi-class vehicle counting. Appl. Intell. 2021, 1–16 (2021)
Chandrasekar, K.S., Geetha, P.: Multiple objects tracking by a highly decisive three-frame differencing-combined-background subtraction method with GMPFM-GMPHD filters and VGG16-LSTM classifier. J Vis Commun Image Represent 72, 102905 (2020)
Ahmed, E., Moustafa, M.: House price estimation from visual and textual features. arXiv:1609.08399 (2016)
Chen, Y., et al.: Drop an octave: reducing spatial redundancy in convolutional neural networks with octave convolution. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3434–3443 (2019)
Carreira, J., Noland, E., Banki-Horvath, A., Hillier, C., Zisserman, A.: A short note about kinetics-600. arXiv:1808.01340 (2018)
Lyu, S., et al.: UA-DETRAC 2018: report of AVSS2018 & IWT4S challenge on advanced traffic monitoring. In: 2018 15th International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE, Auckland (2018)
Acknowledgments
This work was supported by the National Natural Science Foundation of China (No. 61562009), the Open Fund Project in Semiconductor Power Device Reliability Engineering Center of Ministry of Education (No. ERCMEKFJJ2019-06), and the Guizhou University Introduced Talent Research Project (No. 2015-29).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Open access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hu, J., Liu, R., Chen, Z. et al. Octave convolution-based vehicle detection using frame-difference as network input. Vis Comput 39, 1503–1515 (2023). https://doi.org/10.1007/s00371-022-02425-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02425-1