A fast and effective video vehicle detection method leveraging feature fusion and proposal temporal link

Yang, Yanni; Song, Huansheng; Sun, Shijie; Zhang, Wentao; Chen, Yan; Rakal, Lionel; Fang, Yong

doi:10.1007/s11554-021-01121-y

A fast and effective video vehicle detection method leveraging feature fusion and proposal temporal link

Special Issue Paper
Published: 18 May 2021

Volume 18, pages 1261–1274, (2021)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Yanni Yang¹,
Huansheng Song¹,
Shijie Sun¹,
Wentao Zhang¹,
Yan Chen¹,
Lionel Rakal¹ &
…
Yong Fang¹

362 Accesses
11 Citations
Explore all metrics

Abstract

Vehicle detection in videos is a valuable but challenging technology in traffic monitoring. Due to the advantage of real-time detection, Single Shot MultiBox Detector (SSD) is often used to detect vehicles in images. However, the accuracy degradation caused by SSD is one of the significant problems in video vehicle detection. To address this problem in real time, this paper enhances the detection performance by improving the SSD and employing the relationship of inter-frame detections. We propose a feature-fused SSD detector and a Tracking-guided Detections Optimizing (TDO) strategy for fast and effective video vehicle detection. We introduce a lightweight feature fusion sub-network to the standard SSD network, which aggregate the deeper layer features into the shallower layer features to enhance the semantic information of the shallower layer features. At the post-processing stage of the feature-fused SSD, the non-maximum suppression (NMS) is replaced by the TDO strategy, which link vehicles of inter-frames by fast tracking algorithm. Thus the missed detections can be compensated by the propagated results, and the confidence of the final results can be optimized in the temporal. Our approach significantly improves the temporal consistency of the detection results with lower complexity computations. We evaluate the proposed method on two datasets. The experiments on our labeled highway dataset show that the mean average precision (mAP) of our method is 8.2% higher than that of the base detector. The runtime of our feature-fused SSD is 27.1 frames per second (fps), which is suitable for real-time detection. The experiments on the ImageNet VID dataset prove that the proposed method is comparable with the state-of-the-art detectors as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A vehicle tracking algorithm combining detector and tracker

Article Open access 28 April 2020

Finding every car: a traffic surveillance multi-scale vehicle object detection method

Article 05 May 2020

Research on vehicle detection based on improved YOLOX_S

Article Open access 27 December 2023

References

Zhang, J., Wang, F.-Y., Wang, K., Lin, W.-H., Xu, X., Chen, C.: Data-driven intelligent transportation systems: a survey. IEEE Trans. Intell. Transport. Syst. 12(4), 1624–1639 (2011)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Gao, Y., Xiao, G.: Real-time Chinese traffic warning signs recognition based on cascade and CNN. J. Real-Time Image Process. 1–12 (2020). https://doi.org/10.1007/s11554-020-01003-9
Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatio-temporal features, In: European Conference on Computer Vision. Springer, pp. 140–153 (2010)
Zhu, X., Xiong, Y., Dai, J., Yuan, L., Wei, Y.: Deep feature flow for video recognition, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2349–2358 (2017)
Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection, In: Proceedings of the IEEE International Conference on Computer Vision, pp. 408–417 (2017)
Zhu, X., Dai, J., Yuan, L., Wei, Y.: Towards high performance video object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7210–7218 (2018)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., Van Der Smagt, P., Cremers, D., Brox, T.: Flownet: Learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)
Wang, S., Zhou, Y., Yan, J., Deng, Z.: Fully motion-aware network for video object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 542–557 (2018)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3038–3046 (2017)
Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by short and long range object linking. arXiv preprint arXiv:1801.09823 (2018)
Lee, B., Erdenee, E., Jin, S., Nam, M.Y., Jung, Y.G., Rhee, P.K.: Multi-class multi-object tracking using changing point detection. In: European Conference on Computer Vision. Springer, pp. 68–83 (2016)
Kang, K., Li, H., Yan, J., Zeng, X., Yang, B., Xiao, T., Zhang, C., Wang, Z., Wang, R., Wang, X., et al.: T-cnn: Tubelets with convolutional neural networks for object detection from videos. IEEE Trans. Circ. Syst. Video Technol. 28(10), 2896–2907 (2017)
Article Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European Conference on Computer Vision. Springer, pp. 21–37 (2016)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Article Google Scholar
Chen, Z., Ellis, T., Velastin, S.A.: Vehicle detection, tracking and classification in urban traffic. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems. IEEE, pp. 951–956 (2012)
Montero, V. J., Jung, W.-Y., Jeong, Y.-J.: Fast background subtraction with adaptive block learning using expectation value suitable for real-time moving object detection. J. Real-Time Image Process. 1–15 (2021). https://doi.org/10.1007/s11554-020-01058-8
Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks. Adv. Neural Info. Process. Syst. 29, 379–387 (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprintarXiv:1804.02767 (2018)
Yang, H., Qu, S.: Real-time vehicle detection and counting in complex traffic scenes using background subtraction model with low-rank decomposition. IET Intell. Transp. Syst. 12(1), 75–85 (2017)
Article MathSciNet Google Scholar
Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking, In: Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), vol. 2. IEEE, 246–252 (1999)
Cocorullo, G., Corsonello, P., Frustaci, F., Perri, S., et al.: Multimodal background subtraction for high-performance embedded systems. J. Real-Time Image Process. 16(5), 1407–1423 (2019)
Article Google Scholar
Hsieh, J.-W., Chen, L.-C., Chen, D.-Y.: Symmetrical surf and its applications to vehicle detection and vehicle make and model recognition. IEEE Trans. Intell. Transp. Syst. 15(1), 6–20 (2014)
Article Google Scholar
Sun, Z., Bebis, G., Miller, R.: Monocular precrash vehicle detection: features and classifiers. IEEE Trans. Image Process. 15(7), 2019–2034 (2006)
Article Google Scholar
Juan, L., Gwun, O.: A comparison of sift, pca-sift and surf. Int. J. Image Process. (IJIP) 3(4), 143–152 (2009)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015)
Google Scholar
Hu, X., Xu, X., Xiao, Y., Chen, H., He, S., Qin, J., Heng, P.-A.: Sinet: A scale-insensitive convolutional neural network for fast vehicle detection. IEEE Trans. Intell. Transp. Syst. 20(3), 1010–1019 (2018)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 779–788 (2016)
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Article Google Scholar
Bertasius, G., Torresani, L., Shi, J.: Object detection in video with spatiotemporal sampling networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 331–346 (2018)
Xiao, F., Lee, Y.J.: Video object detection with an aligned spatial-temporal memory. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 485–501 (2018)
Kang, K., Li, H., Xiao, T., Ouyang, W., Yan, J., Liu, X., Wang, X.: Object detection in videos with tubelet proposal networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 727–735 (2017)
Congrui, H., Qin, H., Liu, S., Yan, J.: Impression network for video object detection. arXiv preprint arXiv:1712.05896, (2017)
Kang, K., Ouyang, W., Li, H., Wang, X.: Object detection from video tubelets with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 817–825 (2016)
Han, W., Khorrami, P., Paine, T.L., Ramachandran, P., Babaeizadeh, M., Shi, H., Li, J., Yan, S., Huang, T.S.: Seq-nms for video object detection. arXiv preprint arXiv:1602.08465 (2016)
Xingjian, S., Chen, Z., Wang, H., Yeung, D.-Y., Wong, W.-K., Woo, W.-C.: Convolutional lstm network: a machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 28, 802–810 (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations, pp. 1–14, (2015)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)
Google Scholar
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(3), 583–596 (2014)
Article Google Scholar
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B., Simple online and realtime tracking, In: IEEE International Conference on Image Processing (ICIP). IEEE, pp. 3464–3468 (2016)

Download references

Acknowledgements

This research was funded by the National Natural Science Foundation of China (62072053), the Fundamental Research Funds for the Central Universities (300102249317), Natural Science Foundation of Shaanxi Province (2019SF-258), and Key R & D project of Shaanxi Science and Technology Department (2019YFB1600500).

Author information

Authors and Affiliations

School of Information Engineering, Chang’an University, Xi’an, 710064, China
Yanni Yang, Huansheng Song, Shijie Sun, Wentao Zhang, Yan Chen, Lionel Rakal & Yong Fang

Authors

Yanni Yang
View author publications
You can also search for this author in PubMed Google Scholar
Huansheng Song
View author publications
You can also search for this author in PubMed Google Scholar
Shijie Sun
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lionel Rakal
View author publications
You can also search for this author in PubMed Google Scholar
Yong Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huansheng Song.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Y., Song, H., Sun, S. et al. A fast and effective video vehicle detection method leveraging feature fusion and proposal temporal link. J Real-Time Image Proc 18, 1261–1274 (2021). https://doi.org/10.1007/s11554-021-01121-y

Download citation

Received: 13 January 2021
Accepted: 27 April 2021
Published: 18 May 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s11554-021-01121-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast and effective video vehicle detection method leveraging feature fusion and proposal temporal link

Abstract

Access this article

Similar content being viewed by others

A vehicle tracking algorithm combining detector and tracker

Finding every car: a traffic surveillance multi-scale vehicle object detection method

Research on vehicle detection based on improved YOLOX_S

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast and effective video vehicle detection method leveraging feature fusion and proposal temporal link

Abstract

Access this article

Similar content being viewed by others

A vehicle tracking algorithm combining detector and tracker

Finding every car: a traffic surveillance multi-scale vehicle object detection method

Research on vehicle detection based on improved YOLOX_S

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation