Vehicle and wheel detection: a novel SSD-based approach and associated large-scale benchmark dataset

Fu, Jiayue; Zhao, Cairong; Xia, Ye; Liu, Wenbin

doi:10.1007/s11042-019-08523-y

Vehicle and wheel detection: a novel SSD-based approach and associated large-scale benchmark dataset

Published: 21 January 2020

Volume 79, pages 12615–12634, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jiayue Fu¹,
Cairong Zhao¹,
Ye Xia² &
…
Wenbin Liu^3,4

901 Accesses
8 Citations
Explore all metrics

Abstract

Many vehicle and wheel detection methods based on convolutional neural networks suffer from challenges due to the lack of training data and the limitation on small object detection. To solve this problem, we present a novel optimized SSD algorithm with multi-concatenation modules, aiming to improve the performance of small object detection. In the multi-concatenation module, features from different layers are concatenated together, including feature map from shallow layer with more location information, feature map from intermediate layer, and feature map from deep layer with rich semantic information. SEBlock is employed to re-weight the new feature map to improve the quality of representation. Furthermore, to facilitate the study of vision-based vehicle and wheel detection, a large-scale benchmark dataset of 8209 images is established, comprising five object categories: truck, pickup, tractor, car, and wheel. On the Pascal VOC 2007 test set, our network achieves 78.7% mAP, which is higher than SSD by 1.5%. On KITTI dataset, the proposed method can reach 71.4% mAP, surpassing SSD by 3.5%. In addition, experimental results show that the proposed method results in better detection performance on small objects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Accurate On-Road Vehicle Detection with Deep Fully Convolutional Networks

Flexible neural network for fast and accurate road scene perception

Article Open access 25 January 2022

References

Bell S, Lawrence C, Zitnick KB, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. Adv Neural Inf Proces Syst 2016:379–387
Google Scholar
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773
Google Scholar
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Felzenszwalb PF, Girshick RB, McAllester D (2010) Cascade object detection with deformable part models. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 2241–2248
Google Scholar
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659
Geiger A, Lenz P, Stiller C (2013) R Urt asun, vision meets robotics: the Kitti dataset. Int J Robot Res 32(11):1231–1237
Article Google Scholar
Girshick R (2015) Fast r-cnn. Proc IEEE Int Conf Comput Vis 2015:1440–1448
Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Google Scholar
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. Proc IEEE Int Conf Comput Vis 2017:2961–2969
Google Scholar
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors, in European conference on computer vision. Springer, Berlin/Heidelberg, pp 340–353
Google Scholar
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Google Scholar
Jaiswal A, Wu Y, AbdAlmageed W, Masi I, Natarajan P (2019) AIRD: adversarial learning framework for image repurposing detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11330–11339
Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853
Google Scholar
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230
Google Scholar
Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings international conference on image processing, vol 1, pp I–I
Chapter Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, pp 740–755
Google Scholar
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector, in European conference on computer vision. Springer, Cham, pp 21–37
Google Scholar
Redmon J, Farhad A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Google Scholar
Redmon J, Farhadi A Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst:91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Simonyan K, Zisserman A Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI conference on artificial intelligence, vol 4, p 12
Google Scholar
Uijlings JR, Van De Sande KEA, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
Yuan Y, Xiong Z, Wang Q (2016) An incremental framework for video-based traffic sign detection, tracking, and recognition. IEEE Trans Intell Transp Syst 18(7):1918–1929
Article Google Scholar
Yuan Y, Xiong Z, Wang Q (2019) VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans Image Process 28(7):3423–3434
Article MathSciNet Google Scholar
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5813–5821
Google Scholar
Zheng L, Fu C, Zhao Y (2018) Extend the shallow part of single shot multibox detector via convolutional neural network. In: Tenth international conference on digital image processing (ICDIP 2018). International Society for Optics and Photonics, 10806: 1080613
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, Cham, pp 391–405
Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions. This work was supported by the China National Natural Science Foundation under Grant No. 61673299, 61,203,247, 61,573,259, 61,573,255, 61,876,218. This work was also supported by the Fundamental Research Funds for the Central Universities and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China
Jiayue Fu & Cairong Zhao
Department of Bridge Engineering, Tongji University, Shanghai, China
Ye Xia
Shanghai Key Laboratory of Crime Scene Evidence, Shanghai, China
Wenbin Liu
Shanghai Research Institute of Criminal Science and Technology, Shanghai, China
Wenbin Liu

Authors

Jiayue Fu
View author publications
You can also search for this author in PubMed Google Scholar
Cairong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ye Xia
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cairong Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fu, J., Zhao, C., Xia, Y. et al. Vehicle and wheel detection: a novel SSD-based approach and associated large-scale benchmark dataset. Multimed Tools Appl 79, 12615–12634 (2020). https://doi.org/10.1007/s11042-019-08523-y

Download citation

Received: 25 April 2019
Revised: 06 November 2019
Accepted: 26 November 2019
Published: 21 January 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11042-019-08523-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Vehicle and wheel detection: a novel SSD-based approach and associated large-scale benchmark dataset

Abstract

Access this article

Similar content being viewed by others

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Accurate On-Road Vehicle Detection with Deep Fully Convolutional Networks

Flexible neural network for fast and accurate road scene perception

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Vehicle and wheel detection: a novel SSD-based approach and associated large-scale benchmark dataset

Abstract

Access this article

Similar content being viewed by others

Deep Learning-Based Multi-scale Multi-object Detection and Classification for Autonomous Driving

Accurate On-Road Vehicle Detection with Deep Fully Convolutional Networks

Flexible neural network for fast and accurate road scene perception

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation