Skip to main content
Log in

Modeling driving task-relevant attention for intelligent vehicles using triplet ranking

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Understanding the driving task-relevant attention (i.e. when to pay more attention?) is beneficial for improved safety in intelligent vehicles. Modeling driving task-relevant attention is challenging because it requires a collective understanding of multiple environmental risk factors in a given traffic scene. We formulate this research problem as a learning to rank task when given a traffic scene from a vehicle-mounted camera, we output an attention score that represents the required driver attention level. In this manner, we explicitly enforce the inherent ordering present in the different required attention levels in addition to a clear separation of attention levels. First, we learn a ranking function by contrasting two traffic scenes at a time using a pairwise ranking loss. Then, we introduce a novel triplet ranking architecture to model driving task-relevant attention with improved accuracy and train time. We evaluate our proposed method using traffic scenes from the Berkeley DeepDrive dataset. Experimental results demonstrate that the proposed method outperforms the existing classifier-based methods by a significant margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 89–96, (2005)

  2. Chechik, G., Sharma, V., Shalit, U., Bengio, S.: Large scale online learning of image similarity through ranking. J. Mach. Learn. Res. 11(3) (2010)

  3. De Brabandere, B., Neven, D., Van Gool, L..: Semantic instance segmentation for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 7–9 (2017)

  4. Diaz, R., Marathe, A..: Soft labels for ordinal regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4738–4747 (2019)

  5. Dong, X., Shen, J.: Triplet loss in siamese network for object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 459–474 (2018)

  6. Greer, R., Isa, J., Deo, N., Rangesh, A., Trivedi, M.M.: On salience-sensitive sign classification in autonomous vehicle path planning: experimental explorations with a novel dataset. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 636–644 (2022)

  7. Gruyer, D., Rakotonirainy, A., Vrignon, J..: The use of belief theory to assess driver’s vigilance (2005)

  8. Guo, J., Kurup, U., Shah, Mohak: Is it safe to drive? an overview of factors, metrics, and datasets for driveability assessment in autonomous driving. IEEE Trans. Intell. Transp. Syst. 21(8), 3135–3151 (2019)

    Article  Google Scholar 

  9. Hafiz, A.M., Bhat, G.M..: A survey on instance segmentation: state of the art. Int. J. Multimedia Inf. Retriev., pp. 1–19 (2020)

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  11. Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition, pP. 84–92. Springer (2015)

  12. Hu, Y., Li, M., Yu, N..: Multiple-instance ranking: Learning to rank images for image retrieval. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

  13. Ibrahim, M.R., Haworth, J., Cheng, T.: Weathernet: Recognising weather and visual conditions from street-level images using deep residual learning. ISPRS Int.J. Geo Inf. 8(12), 549 (2019)

  14. Janai, J., Güney, F., Behl, A., Geiger, A. et al.: Computer vision for autonomous vehicles: Problems, datasets and state of the art. Foundations and Trends® in Computer Graphics and Vision 12(1–3), 1–308 (2020)

  15. Kozuka, K., Carlos Niebles, J.: Risky region localization with point supervision. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 246–253 (2017)

  16. Li, X., Li, L., Flohr, F., Wang, J., Xiong, H., Bernhard, M., Pan, S., Gavrila, D.M., Li, K.: A unified framework for concurrent pedestrian and cyclist detection. IEEE Trans. Intell. Transp. Syst. 18(2), 269–281 (2016)

  17. Lund, I.O., Rundmo, T.: Cross-cultural comparisons of traffic safety, risk perception, attitudes and behaviour. Safety Sci. 47(4), 547–553 (2009)

  18. Machado-León, J.L., de Oña, J., de Oña, R., Eboli, L., Mazzulla, G..: Socio-economic and driving experience factors affecting drivers’ perceptions of traffic crash risk. Transp. Res. Part F Traffic Psychol. Behav. 37, 41–51 (2016)

  19. Manana, M., Tu, C., Owolawi, P.A..: A survey on vehicle detection based on convolution neural networks. In: 2017 3rd IEEE International Conference on Computer and Communications (ICCC), pp. 1751–1755. IEEE (2017)

  20. McGill, R., Tukey, J.W., Larsen, W.A.: Variations of box plots. The American Statistician, 32(1), 12–16 (1978)

  21. Narayanan, A., Dwivedi, I., Dariush, B.: Dynamic traffic scene classification with space-time coherence. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 5629–5635. IEEE (2019)

  22. Ning, C., Menglu, L., Yuan Hao, S., Xueping, L.Y.: Survey of pedestrian detection with occlusion. Complex Intell. Syst. 7(1), 577–587 (2021)

  23. Parikh, D., Grauman, K.: Relative attributes. In: 2011 International Conference on Computer Vision, pp. 503–510. IEEE (2011)

  24. Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition (2015)

  25. Ping, P., Sheng, Y., Qin, W., Miyajima, C., Takeda, K.: Modeling driver risk perception on city roads using deep learning. IEEE Access 6, 68850–68866 (2018)

    Article  Google Scholar 

  26. Rolison, J.J., Regev, S., Moutari, S., Feeney, A.: What are the factors that contribute to road accidents? An assessment of law enforcement views, ordinary drivers’ opinions, and road accident records. Accident Anal. Prevent. 115, 11–24 (2018)

  27. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)

  28. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

  29. Singh, K.K., Lee, Y.J.: End-to-end localization and ranking for relative attributes. In: European Conference on Computer Vision, pp. 753–769. Springer (2016)

  30. Sivaraman, S., Trivedi, M.M.: Looking at vehicles on the road: a survey of vision-based vehicle detection, tracking, and behavior analysis. IEEE Trans. Intell. Transp. Syst. 14(4), 1773–1795 (2013)

  31. Souri, Y., Noury, E., Adeli, E.: Deep relative attributes. In: Asian Conf. Comput. Vis., pp. 118–133. Springer (2016)

  32. Tieleman, T., Hinton, G.: Lecture 6.5—RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. (2012)

  33. Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11) (2008)

  34. Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–1393 (2014)

  35. Wang, Y., Kato, J.: Collision risk rating of traffic scene from dashboard cameras. In 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–6. IEEE (2017)

  36. Withanawasam, J., Javanmardi, E., Yanlei, G., Kamijo, S.: Modeling required driver attention level based on environmental risk factors using deep convolutional neural networks. Int. J. Auto. Eng. 12(4), 125–133 (2021)

    Google Scholar 

  37. Withanawasam, J., Javanmardi, E., Wong, K., Javanmardi, M., Kamijo, S.: Road scene risk perception for intelligent vehicles using end-to-end affordance learning and visual reasoning. In: Pattern Recogn., pp. 849–862. Springer International Publishing (2020)

  38. Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., Darrell, T.: Bdd100k: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 (2018)

  39. Yu, S.Y., Malawade, A.V., Muthirayan, D., Khargonekar, P.P., Al Faruque, M.A.: Scene-graph augmented data-driven risk assessment of autonomous vehicle decisions. IEEE Trans. Intell. Transp. Syst. (2021)

  40. Yurtsever, E., Liu, Y., Lambert, J., Miyajima, C., Takeuchi, E., Takeda, K., Hansen, J.H.L.: Risky action recognition in lane change video clips using deep spatiotemporal networks with segmentation mask transfer. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 3100–3107. IEEE (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jayani Withanawasam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Withanawasam, J., Kamijo, S. Modeling driving task-relevant attention for intelligent vehicles using triplet ranking. Machine Vision and Applications 34, 91 (2023). https://doi.org/10.1007/s00138-023-01437-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01437-8

Keywords

Navigation