Skip to main content
Log in

Vehicle and wheel detection: a novel SSD-based approach and associated large-scale benchmark dataset

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Many vehicle and wheel detection methods based on convolutional neural networks suffer from challenges due to the lack of training data and the limitation on small object detection. To solve this problem, we present a novel optimized SSD algorithm with multi-concatenation modules, aiming to improve the performance of small object detection. In the multi-concatenation module, features from different layers are concatenated together, including feature map from shallow layer with more location information, feature map from intermediate layer, and feature map from deep layer with rich semantic information. SEBlock is employed to re-weight the new feature map to improve the quality of representation. Furthermore, to facilitate the study of vision-based vehicle and wheel detection, a large-scale benchmark dataset of 8209 images is established, comprising five object categories: truck, pickup, tractor, car, and wheel. On the Pascal VOC 2007 test set, our network achieves 78.7% mAP, which is higher than SSD by 1.5%. On KITTI dataset, the proposed method can reach 71.4% mAP, surpassing SSD by 3.5%. In addition, experimental results show that the proposed method results in better detection performance on small objects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig.. 2
Fig.. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bell S, Lawrence C, Zitnick KB, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

    Google Scholar 

  2. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. Adv Neural Inf Proces Syst 2016:379–387

    Google Scholar 

  3. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773

    Google Scholar 

  4. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  5. Felzenszwalb PF, Girshick RB, McAllester D (2010) Cascade object detection with deformable part models. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 2241–2248

    Google Scholar 

  6. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659

  7. Geiger A, Lenz P, Stiller C (2013) R Urt asun, vision meets robotics: the Kitti dataset. Int J Robot Res 32(11):1231–1237

    Article  Google Scholar 

  8. Girshick R (2015) Fast r-cnn. Proc IEEE Int Conf Comput Vis 2015:1440–1448

    Google Scholar 

  9. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

    Google Scholar 

  10. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

    Google Scholar 

  11. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

    Google Scholar 

  13. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. Proc IEEE Int Conf Comput Vis 2017:2961–2969

    Google Scholar 

  14. Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors, in European conference on computer vision. Springer, Berlin/Heidelberg, pp 340–353

    Google Scholar 

  15. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

    Google Scholar 

  16. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

    Google Scholar 

  17. Jaiswal A, Wu Y, AbdAlmageed W, Masi I, Natarajan P (2019) AIRD: adversarial learning framework for image repurposing detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11330–11339

    Google Scholar 

  18. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678

  19. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853

    Google Scholar 

  20. Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230

    Google Scholar 

  21. Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings international conference on image processing, vol 1, pp I–I

    Chapter  Google Scholar 

  22. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, pp 740–755

    Google Scholar 

  23. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

    Google Scholar 

  24. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector, in European conference on computer vision. Springer, Cham, pp 21–37

    Google Scholar 

  25. Redmon J, Farhad A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

    Google Scholar 

  26. Redmon J, Farhadi A Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

  27. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

    Google Scholar 

  28. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst:91–99

  29. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  30. Simonyan K, Zisserman A Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  31. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI conference on artificial intelligence, vol 4, p 12

    Google Scholar 

  32. Uijlings JR, Van De Sande KEA, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171

    Article  Google Scholar 

  33. Yuan Y, Xiong Z, Wang Q (2016) An incremental framework for video-based traffic sign detection, tracking, and recognition. IEEE Trans Intell Transp Syst 18(7):1918–1929

    Article  Google Scholar 

  34. Yuan Y, Xiong Z, Wang Q (2019) VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans Image Process 28(7):3423–3434

    Article  MathSciNet  Google Scholar 

  35. Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5813–5821

    Google Scholar 

  36. Zheng L, Fu C, Zhao Y (2018) Extend the shallow part of single shot multibox detector via convolutional neural network. In: Tenth international conference on digital image processing (ICDIP 2018). International Society for Optics and Photonics, 10806: 1080613

  37. Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, Cham, pp 391–405

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions. This work was supported by the China National Natural Science Foundation under Grant No. 61673299, 61,203,247, 61,573,259, 61,573,255, 61,876,218. This work was also supported by the Fundamental Research Funds for the Central Universities and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cairong Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, J., Zhao, C., Xia, Y. et al. Vehicle and wheel detection: a novel SSD-based approach and associated large-scale benchmark dataset. Multimed Tools Appl 79, 12615–12634 (2020). https://doi.org/10.1007/s11042-019-08523-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08523-y

Keywords

Navigation