Abstract
Multi-class vehicle detection and counting in video-based traffic surveillance systems with real-time performance and acceptable precision are challenging. This paper proposes a modified single shot multi-box convolutional neural network named Inception-SSD (ISSD) for vehicle detection and a centroid matching algorithm for vehicle counting. An Inception-like block is introduced to replace the extra feature layers in the original SSD to deal with the multi-scale vehicle detection to enhance smaller vehicles’ detection. Non-Maximum Suppression (NMS) is replaced with Affinity Propagation Clustering (APC) to improve the detection of nearby occluded vehicles. For a 300 × 300 input image, on PASCAL VOC 2007 test data set, the proposed ISSD achieved 79.3 mean Average Precision (mAP) and ran on an NVIDIA RTX2080Ti; the network attains a speed of 52.3 frames per second. ISSD with APC generates 2.7% improvement in mAP over original SSD300 while almost retaining its time efficiency. By centroid matching algorithm, the vehicles are counted class-wise with a weighted F1 of 98.5%, which is quite superior to the other recent existing research works.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alessandretti G, Broggi A, Cerri P (2007) Vehicle and guard rail detection using radar and vision data fusion. IEEE Trans Intell Transp Syst 8(1):95–105. https://doi.org/10.1109/TITS.2006.888597
Jo Y, Jung I (2014) Analysis of vehicle detection with wsn-based ultrasonic sensors. Sensors 14:4050–14069. https://doi.org/10.3390/s140814050
Perttunen M, Kostakos V, Riekki J, Ojala T (2015) Urban traffic analysis through multi-modal sensing. Pers Ubiquit Comput 19(3):709–721. https://doi.org/10.1007/s00779-015-0833-4
Mimbela L E Y, Klein L A (2000) Summary of vehicle detection and surveillance technologies used in intelligent transportation systems. Technical report, Federal Highway Administration s (FHWA) Intelligent Transportation Systems Joint Program Office
Wang, G, Xiao, D, Gu J (2008) Review on vehicle detection based on video for traffic surveillance. In: 2008 IEEE International Conference on Automation and Logistics, pp 2961– 2966
Druzhkov PN, Kustikova VD (2016) A survey of deep learning methods and software tools for image classification and object detection. Pattern Recogn Image Anal 26(1):9–15
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Ning C, Zhou H, Song Y, Tang J (2017) Inception single shot multibox detector for object detection. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, pp 549–554
Frey B J, Dueck D (2007) Clustering by passing messages between data points. Science 315 (5814):972–976
Henriques J F, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Piccardi M (2004) Background subtraction techniques: a review. In: 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), vol 4. IEEE, pp 3099–3104
Sengar S S, Mukhopadhyay S (2016) A novel method for moving object detection based on block based frame differencing. In: 2016 3rd International Conference on Recent Advances in Information Technology (RAIT). IEEE, pp 467–472
Cucchiara R, Grana C, Piccardi M, Prati A (2003) Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans Pattern Anal Mach Intell 25(10):1337–1342
Harikrishnan P M, Anju T, Nisha J S, Varun G, Palanisamy P (2020) Pixel matching search algorithm for counting moving vehicle in highway traffic videos. Multimedia Tools and Applications:1–20. https://doi.org/10.1007/s11042-020-09666-z
Putra B C, Setiyono B, Sulistyaningrum D R, Mukhlash I, et al. (2018) Moving vehicle classification using pixel quantity based on gaussian mixture models. In: 2018 3rd International Conference on Computer and Communication Systems (ICCCS). IEEE, pp 254–257
Zhao Z-Q, Zheng P, Xu S-, Wu X (2019) Object detection with deep learning: A review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg A C (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659
Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp 1919–1927
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Ning C, Zhou H, Song Y, Tang J (2017) Inception single shot multibox detector for object detection. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, pp 549–554
Thomas A, P. M. H, P. P, Gopi V P (2020) Moving vehicle candidate recognition and classification using inception-resnet-v2. In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp 467–472
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Rothe R, Guillaumin M, Van Gool L (2015) Non-maximum suppression for object detection by passing messages between windows. In: Cremers D, Reid I, Saito H, Yang M-H (eds) Computer Vision – ACCV 2014. Springer International Publishing, Cham, pp 290–306
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 886–893
Lowe D G (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Gayathri S, Gopi V P, Palanisamy P (2020) Automated classification of diabetic retinopathy through reliable feature selection. Phys Eng Sci Med 43(3):927–945
Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422
Hare S, Golodetz S, Saffari A, Vineet V, Cheng M-M, Hicks S L, Torr PHS (2015) Struck: Structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096– 2109
Bolme D S, Beveridge J R, Draper B A, Lui Y M (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2544–2550
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868. https://doi.org/10.1109/ACCESS.2019.2939201
Liu F, Zeng Z, Jiang R (2017) A video-based real-time adaptive vehicle-counting system for urban roads. PLOS ONE 12(11):1–16. https://doi.org/10.1371/journal.pone.0186098
Abdelwahab M (2019) Fast approach for efficient vehicle counting. Electron Lett 55:20–22. https://doi.org/10.1049/el.2018.6719
Abdelwahab M (2019) Accurate vehicle counting approach based on deep neural networks, pp 1–5
Li S, Chang F, Liu C (2020) Bi-directional dense traffic counting based on spatio-temporal counting feature and counting-lstm network. IEEE Trans Intell Transp Syst:1–13
Liu C, Huynh Q, Sun Y, Reynolds M, Atkinson S (2020) A vision-based pipeline for vehicle counting, speed estimation, and classification. IEEE Trans Intell Transp Syst:1–14
Meng Q, Song H, Zhang Y, Zhang X, Li G, Yang Y (2020) Video-based vehicle counting for expressway: A novel approach based on vehicle detection and correlation-matched tracking using image data from ptz cameras. Math Probl Eng 2020:1–16
Liang H, Song H, Li H, Dai Z (2020) Vehicle counting system using deep learning and multi-object tracking methods. Transp Res Rec 2674(4):114–128. https://doi.org/10.1177/0361198120912742
Acknowledgements
This work was funded by Vandi Technologies PTE LTD Singapore, (Grant No. VANDI/PS01/NITT1821 dated 10-09-2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Harikrishnan, P.M., Thomas, A., Gopi, V.P. et al. Inception single shot multi-box detector with affinity propagation clustering and their application in multi-class vehicle counting. Appl Intell 51, 4714–4729 (2021). https://doi.org/10.1007/s10489-020-02127-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02127-y