Abstract
Many vehicle and wheel detection methods based on convolutional neural networks suffer from challenges due to the lack of training data and the limitation on small object detection. To solve this problem, we present a novel optimized SSD algorithm with multi-concatenation modules, aiming to improve the performance of small object detection. In the multi-concatenation module, features from different layers are concatenated together, including feature map from shallow layer with more location information, feature map from intermediate layer, and feature map from deep layer with rich semantic information. SEBlock is employed to re-weight the new feature map to improve the quality of representation. Furthermore, to facilitate the study of vision-based vehicle and wheel detection, a large-scale benchmark dataset of 8209 images is established, comprising five object categories: truck, pickup, tractor, car, and wheel. On the Pascal VOC 2007 test set, our network achieves 78.7% mAP, which is higher than SSD by 1.5%. On KITTI dataset, the proposed method can reach 71.4% mAP, surpassing SSD by 3.5%. In addition, experimental results show that the proposed method results in better detection performance on small objects.
Similar content being viewed by others
References
Bell S, Lawrence C, Zitnick KB, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. Adv Neural Inf Proces Syst 2016:379–387
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Felzenszwalb PF, Girshick RB, McAllester D (2010) Cascade object detection with deformable part models. In: IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 2241–2248
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC Dssd: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659
Geiger A, Lenz P, Stiller C (2013) R Urt asun, vision meets robotics: the Kitti dataset. Int J Robot Res 32(11):1231–1237
Girshick R (2015) Fast r-cnn. Proc IEEE Int Conf Comput Vis 2015:1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. Proc IEEE Int Conf Comput Vis 2017:2961–2969
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors, in European conference on computer vision. Springer, Berlin/Heidelberg, pp 340–353
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Jaiswal A, Wu Y, AbdAlmageed W, Masi I, Natarajan P (2019) AIRD: adversarial learning framework for image repurposing detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11330–11339
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853
Li J, Liang X, Wei Y, Xu T, Feng J, Yan S (2017) Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1222–1230
Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings international conference on image processing, vol 1, pp I–I
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Cham, pp 740–755
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector, in European conference on computer vision. Springer, Cham, pp 21–37
Redmon J, Farhad A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst:91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Simonyan K, Zisserman A Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-First AAAI conference on artificial intelligence, vol 4, p 12
Uijlings JR, Van De Sande KEA, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Yuan Y, Xiong Z, Wang Q (2016) An incremental framework for video-based traffic sign detection, tracking, and recognition. IEEE Trans Intell Transp Syst 18(7):1918–1929
Yuan Y, Xiong Z, Wang Q (2019) VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans Image Process 28(7):3423–3434
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5813–5821
Zheng L, Fu C, Zhao Y (2018) Extend the shallow part of single shot multibox detector via convolutional neural network. In: Tenth international conference on digital image processing (ICDIP 2018). International Society for Optics and Photonics, 10806: 1080613
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, Cham, pp 391–405
Acknowledgements
The authors would like to thank the anonymous reviewers for their critical and constructive comments and suggestions. This work was supported by the China National Natural Science Foundation under Grant No. 61673299, 61,203,247, 61,573,259, 61,573,255, 61,876,218. This work was also supported by the Fundamental Research Funds for the Central Universities and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fu, J., Zhao, C., Xia, Y. et al. Vehicle and wheel detection: a novel SSD-based approach and associated large-scale benchmark dataset. Multimed Tools Appl 79, 12615–12634 (2020). https://doi.org/10.1007/s11042-019-08523-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-08523-y