Skip to main content
Log in

Inception single shot multi-box detector with affinity propagation clustering and their application in multi-class vehicle counting

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-class vehicle detection and counting in video-based traffic surveillance systems with real-time performance and acceptable precision are challenging. This paper proposes a modified single shot multi-box convolutional neural network named Inception-SSD (ISSD) for vehicle detection and a centroid matching algorithm for vehicle counting. An Inception-like block is introduced to replace the extra feature layers in the original SSD to deal with the multi-scale vehicle detection to enhance smaller vehicles’ detection. Non-Maximum Suppression (NMS) is replaced with Affinity Propagation Clustering (APC) to improve the detection of nearby occluded vehicles. For a 300 × 300 input image, on PASCAL VOC 2007 test data set, the proposed ISSD achieved 79.3 mean Average Precision (mAP) and ran on an NVIDIA RTX2080Ti; the network attains a speed of 52.3 frames per second. ISSD with APC generates 2.7% improvement in mAP over original SSD300 while almost retaining its time efficiency. By centroid matching algorithm, the vehicles are counted class-wise with a weighted F1 of 98.5%, which is quite superior to the other recent existing research works.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Alessandretti G, Broggi A, Cerri P (2007) Vehicle and guard rail detection using radar and vision data fusion. IEEE Trans Intell Transp Syst 8(1):95–105. https://doi.org/10.1109/TITS.2006.888597

    Article  Google Scholar 

  2. Jo Y, Jung I (2014) Analysis of vehicle detection with wsn-based ultrasonic sensors. Sensors 14:4050–14069. https://doi.org/10.3390/s140814050

    Article  Google Scholar 

  3. Perttunen M, Kostakos V, Riekki J, Ojala T (2015) Urban traffic analysis through multi-modal sensing. Pers Ubiquit Comput 19(3):709–721. https://doi.org/10.1007/s00779-015-0833-4

    Article  Google Scholar 

  4. Mimbela L E Y, Klein L A (2000) Summary of vehicle detection and surveillance technologies used in intelligent transportation systems. Technical report, Federal Highway Administration s (FHWA) Intelligent Transportation Systems Joint Program Office

  5. Wang, G, Xiao, D, Gu J (2008) Review on vehicle detection based on video for traffic surveillance. In: 2008 IEEE International Conference on Automation and Logistics, pp 2961– 2966

  6. Druzhkov PN, Kustikova VD (2016) A survey of deep learning methods and software tools for image classification and object detection. Pattern Recogn Image Anal 26(1):9–15

    Article  Google Scholar 

  7. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg A C (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  8. Ning C, Zhou H, Song Y, Tang J (2017) Inception single shot multibox detector for object detection. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, pp 549–554

  9. Frey B J, Dueck D (2007) Clustering by passing messages between data points. Science 315 (5814):972–976

    Article  MathSciNet  Google Scholar 

  10. Henriques J F, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596

    Article  Google Scholar 

  11. Piccardi M (2004) Background subtraction techniques: a review. In: 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No. 04CH37583), vol 4. IEEE, pp 3099–3104

  12. Sengar S S, Mukhopadhyay S (2016) A novel method for moving object detection based on block based frame differencing. In: 2016 3rd International Conference on Recent Advances in Information Technology (RAIT). IEEE, pp 467–472

  13. Cucchiara R, Grana C, Piccardi M, Prati A (2003) Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans Pattern Anal Mach Intell 25(10):1337–1342

    Article  Google Scholar 

  14. Harikrishnan P M, Anju T, Nisha J S, Varun G, Palanisamy P (2020) Pixel matching search algorithm for counting moving vehicle in highway traffic videos. Multimedia Tools and Applications:1–20. https://doi.org/10.1007/s11042-020-09666-z

  15. Putra B C, Setiyono B, Sulistyaningrum D R, Mukhlash I, et al. (2018) Moving vehicle classification using pixel quantity based on gaussian mixture models. In: 2018 3rd International Conference on Computer and Communication Systems (ICCCS). IEEE, pp 254–257

  16. Zhao Z-Q, Zheng P, Xu S-, Wu X (2019) Object detection with deep learning: A review. IEEE Trans Neural Netw Learn Syst 30(11):3212–3232

    Article  Google Scholar 

  17. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229

  18. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  19. Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171

  20. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  21. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  22. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  23. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  24. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  25. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  26. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  27. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg A C (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659

  28. Shen Z, Liu Z, Li J, Jiang Y-G, Chen Y, Xue X (2017) Dsod: Learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp 1919–1927

  29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  30. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  31. Ning C, Zhou H, Song Y, Tang J (2017) Inception single shot multibox detector for object detection. In: 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, pp 549–554

  32. Thomas A, P. M. H, P. P, Gopi V P (2020) Moving vehicle candidate recognition and classification using inception-resnet-v2. In: 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp 467–472

  33. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

  34. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167

  35. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  36. Rothe R, Guillaumin M, Van Gool L (2015) Non-maximum suppression for object detection by passing messages between windows. In: Cremers D, Reid I, Saito H, Yang M-H (eds) Computer Vision – ACCV 2014. Springer International Publishing, Cham, pp 290–306

  37. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol 1, pp 886–893

  38. Lowe D G (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  39. Gayathri S, Gopi V P, Palanisamy P (2020) Automated classification of diabetic retinopathy through reliable feature selection. Phys Eng Sci Med 43(3):927–945

    Article  Google Scholar 

  40. Kalal Z, Mikolajczyk K, Matas J (2011) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422

    Article  Google Scholar 

  41. Hare S, Golodetz S, Saffari A, Vineet V, Cheng M-M, Hicks S L, Torr PHS (2015) Struck: Structured output tracking with kernels. IEEE Trans Pattern Anal Mach Intell 38(10):2096– 2109

    Article  Google Scholar 

  42. Bolme D S, Beveridge J R, Draper B A, Lui Y M (2010) Visual object tracking using adaptive correlation filters. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 2544–2550

  43. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  44. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

  45. Kingma D P, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980

  46. Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, Qu R (2019) A survey of deep learning-based object detection. IEEE Access 7:128837–128868. https://doi.org/10.1109/ACCESS.2019.2939201

    Article  Google Scholar 

  47. Liu F, Zeng Z, Jiang R (2017) A video-based real-time adaptive vehicle-counting system for urban roads. PLOS ONE 12(11):1–16. https://doi.org/10.1371/journal.pone.0186098

    Google Scholar 

  48. Abdelwahab M (2019) Fast approach for efficient vehicle counting. Electron Lett 55:20–22. https://doi.org/10.1049/el.2018.6719

    Article  Google Scholar 

  49. Abdelwahab M (2019) Accurate vehicle counting approach based on deep neural networks, pp 1–5

  50. Li S, Chang F, Liu C (2020) Bi-directional dense traffic counting based on spatio-temporal counting feature and counting-lstm network. IEEE Trans Intell Transp Syst:1–13

  51. Liu C, Huynh Q, Sun Y, Reynolds M, Atkinson S (2020) A vision-based pipeline for vehicle counting, speed estimation, and classification. IEEE Trans Intell Transp Syst:1–14

  52. Meng Q, Song H, Zhang Y, Zhang X, Li G, Yang Y (2020) Video-based vehicle counting for expressway: A novel approach based on vehicle detection and correlation-matched tracking using image data from ptz cameras. Math Probl Eng 2020:1–16

    Google Scholar 

  53. Liang H, Song H, Li H, Dai Z (2020) Vehicle counting system using deep learning and multi-object tracking methods. Transp Res Rec 2674(4):114–128. https://doi.org/10.1177/0361198120912742

    Article  Google Scholar 

Download references

Acknowledgements

This work was funded by Vandi Technologies PTE LTD Singapore, (Grant No. VANDI/PS01/NITT1821 dated 10-09-2018)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Varun P. Gopi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Harikrishnan, P.M., Thomas, A., Gopi, V.P. et al. Inception single shot multi-box detector with affinity propagation clustering and their application in multi-class vehicle counting. Appl Intell 51, 4714–4729 (2021). https://doi.org/10.1007/s10489-020-02127-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02127-y

Keywords

Navigation