Skip to main content

Assessing the YOLO Series Through Empirical Analysis on the KITTI Dataset for Autonomous Driving

  • Conference paper
  • First Online:
Intelligent Transport Systems. From Research and Development to the Market Uptake (INTSYS 2019)

Abstract

Computer vision and deep learning have been widely popularised on the turn of the 21\(^{st}\) century. On the centre of its applications we find autonomous driving. As this challenge becomes a racing platform for all companies, both directly and indirectly involved with transportation systems, it is only pertinent to evaluate exactly how some generic, state-of-the-art models can perform on datasets specifically built for autonomous driving research. With this purpose, this article aims at directly studying the evolution of the YOLO (You Only Look Once) model since its first implementation until the most recent version 3. Experiences carried out on the respected and acknowledged driving dataset and benchmark known as KITTI Vision Benchmark enable direct comparison between the newest updated version and its predecessor. Results show how the two versions of the model have a performance gap whilst being tested on the same dataset and using a similar configuration setup. YOLO version 3 shows its renewed boost in accuracy whilst dropping minimally on detection speed. Some conclusions on the applicability of models such as this to a real-world scenario are drawn so as to predict the direction of research in the area of autonomous driving.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Found at https://pjreddie.com/darknet/yolo/.

  2. 2.

    Download at http://www.cvlibs.net/datasets/kitti/raw_data.php.

References

  1. Brown, T.: Plein Air Oil Painting (2015). http://tombrownfineart.blogspot.com/2015/06/25-cars-8x10-plein-air-oil-painting-by.html

  2. Everingham, M., Eslami, S.M.A., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes challenge: a retrospective. Int. J. Comput. Vis. (2014). https://doi.org/10.1007/s11263-014-0733-5

    Article  Google Scholar 

  3. Fritsch, J., Kuehnl, T., Geiger, A.: A new performance measure and evaluation benchmark for road detection algorithms. In: International Conference on Intelligent Transportation Systems (ITSC) (2013)

    Google Scholar 

  4. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. CoRR abs/1701.06659 (2017)

    Google Scholar 

  5. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. 32(11), 1231–1237 (2013). https://doi.org/10.1177/0278364913491297

    Article  Google Scholar 

  6. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision 2015 Inter, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169

  7. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81

  8. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision 2017, pp. 2980–2988, October 2017. https://doi.org/10.1109/ICCV.2017.322

  9. Lenc, K., Vedaldi, A.: R-CNN minus R. In: British Machine Vision Conference (2015)

    Google Scholar 

  10. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944, July 2017. https://doi.org/10.1109/CVPR.2017.106

  11. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision 2017, pp. 2999–3007, October 2017. https://doi.org/10.1109/ICCV.2017.324

  12. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  13. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  14. Loureiro, P.F.Q., Rossetti, R.J.F., Braga, R.A.M.: Video processing techniques for traffic information acquisition using uncontrolled video streams. In: 2009 12th International IEEE Conference on Intelligent Transportation Systems, pp. 1–7, October 2009

    Google Scholar 

  15. Neto, J., Santos, D., Rossetti, R.J.F.: Computer-vision-based surveillance of intelligent transportation systems. In: 2018 13th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–5, June 2018. https://doi.org/10.23919/CISTI.2018.8399240

  16. Pereira, J.L.F., Rossetti, R.J.F.: An integrated architecture for autonomous vehicles simulation. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC 2012, pp. 286–292. ACM, New York (2012)

    Google Scholar 

  17. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, June 2016. https://doi.org/10.1109/CVPR.2016.91

  18. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525, July 2017. https://doi.org/10.1109/CVPR.2017.690

  19. Redmon, J.: Darknet: Open source neural networks in C (2013–2016). https://pjreddie.com/darknet/

  20. Redmon, J., Farhadi, A., Ap, C.: YOLOv3 : an incremental improvement. Technical report (2018). https://doi.org/10.1109/CVPR.2017.690

  21. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  22. Rossetti, R.J.F., Oliveira, E.C., Bazzan, A.L.C.: Towards a specification of a framework for sustainable transportation analysis. In: 13th Portuguese Conference on Artificial Intelligence, EPIA, Guimarães, Portugal, pp. 179–190. APPIA (2007)

    Google Scholar 

  23. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  24. Uijlings, J.R., Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https://doi.org/10.1007/s11263-013-0620-5

    Article  Google Scholar 

Download references

Aknowledgements

This work is supported by: European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project n\(^\circ \) 037902; Funding Reference: POCI-01-0247-FEDER-037902].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filipa Ramos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ramos, F., Correia, A., Rossetti, R.J.F. (2020). Assessing the YOLO Series Through Empirical Analysis on the KITTI Dataset for Autonomous Driving. In: Martins, A., Ferreira, J., Kocian, A. (eds) Intelligent Transport Systems. From Research and Development to the Market Uptake. INTSYS 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 310. Springer, Cham. https://doi.org/10.1007/978-3-030-38822-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-38822-5_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-38821-8

  • Online ISBN: 978-3-030-38822-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics