Abstract
We present a new on-road driving dataset, called “Look Both Ways”, which contains synchronized video of both driver faces and the forward road scene, along with ground truth gaze data registered from eye tracking glasses worn by the drivers. Our dataset supports the study of methods for non-intrusively estimating a driver’s focus of attention while driving - an important application area in road safety. A key challenge is that this task requires accurate gaze estimation, but supervised appearance-based gaze estimation methods often do not transfer well to real driving datasets, and in-domain ground truth to supervise them is difficult to gather. We therefore propose a method for self-supervision of driver gaze, by taking advantage of the geometric consistency between the driver’s gaze direction and the saliency of the scene as observed by the driver. We formulate a 3D geometric learning framework to enforce this consistency, allowing the gaze model to supervise the scene saliency model, and vice versa. We implement a prototype of our method and test it with our dataset, to show that compared to a supervised approach it can yield better gaze estimation and scene saliency estimation with no additional labels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use a von Mises-Fisher density function where \(\kappa \) is equivalent to the standard deviation of a Gaussian density function.
References
International Data Corporation: Worldwide Autonomous Vehicle Forecast, 2020–2024 (2020)
SAE Levels of Driving Automation Refined for Clarity and International Audience (2021). https://www.sae.org/blog/sae-j3016-update
Baee, S., Pakdamanian, E., Kim, I., Feng, L., Ordonez, V., Barnes, L.: MEDIRL: predicting the visual attention of drivers via maximum entropy deep inverse reinforcement learning. In: ICCV (2021)
Baluja, S., Pomerleau, D.: Non-intrusive gaze tracking using artificial neural networks (1993)
Bylinskii, Z., Recasens, A., Borji, A., Oliva, A., Torralba, A., Durand, F.: Where should saliency models look next? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 809–824. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_49
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR (2020)
Cao, Z., Hidalgo Martinez, G., Simon, T., Wei, S., Sheikh, Y.A.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. TPAMI 43, 172–186 (2019)
Chang, Z., Matias Di Martino, J., Qiu, Q., Espinosa, S., Sapiro, G.: SalGaze: personalizing gaze estimation using visual saliency. In: ICCV Workshops (2019)
Deng, H., Zhu, W.: Monocular free-head 3D gaze tracking with deep learning and geometry constraints. In: ICCV (2017)
Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 419–435. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_25
Fang, J., Yan, D., Qiao, J., Xue, J., Yu, H.: DADA: driver attention prediction in driving accident scenarios. IEEE Trans. Intell. Transp. Syst. 23, 4959–4971 (2021)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. ACM Commun. 24, 381–395 (1981)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: CVPR (2012)
Hansen, D.W., Ji, Q.: In the eye of the beholder: a survey of models for eyes and gaze. TPAMI 32, 478–500 (2009)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Jain, A., Koppula, H.S., Raghavan, B., Soh, S., Saxena, A.: Car that knows before you do: anticipating maneuvers via learning temporal driving models. In: ICCV (2015)
Jiang, M., Huang, S., Duan, J., Zhao, Q.: SALICON: saliency in context. In: CVPR (2015)
Kellnhofer, P., Recasens, A., Stent, S., Matusik, W., Torralba, A.: Gaze360: physically unconstrained gaze estimation in the wild. In: ICCV (2019)
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv (2014)
Land, M.F.: Eye movements and the control of actions in everyday life. Prog. Retinal Eye Res. 25, 296–324 (2006)
Lindén, E., Sjostrand, J., Proutiere, A.: Learning to personalize in appearance-based gaze tracking. In: ICCV Workshops (2019)
Lipson, L., Teed, Z., Deng, J.: RAFT-stereo: multilevel recurrent field transforms for stereo matching. In: 23DV (2021)
Lowe, D.G.: Object recognition from local scale-invariant features. IJCV (1999)
Martin, M., et al.: Drive &Act: a multi-modal dataset for fine-grained driver behavior recognition in autonomous vehicles. In: ICCV (2019)
Mathe, S., Sminchisescu, C.: Actions in the eye: dynamic gaze datasets and learnt saliency models for visual recognition. TPAMI 37, 1408–1424 (2015)
Min, K., Corso, J.J.: TASED-Net: temporally-aggregating spatial encoder-decoder network for video saliency detection. In: ICCV (2019)
Ortega, J.D., et al.: DMD: a large-scale multi-modal driver monitoring dataset for attention and alertness analysis. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12538, pp. 387–405. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-66823-5_23
Palazzi, A., Abati, D., Solera, F., Cucchiara, R., et al.: Predicting the driver’s focus of attention: the DR(eye)VE project. TPAMI 41, 1720–1733 (2018)
Park, S., Aksan, E., Zhang, X., Hilliges, O.: Towards end-to-end video-based eye-tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 747–763. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_44
Park, S., Mello, S.D., Molchanov, P., Iqbal, U., Hilliges, O., Kautz, J.: Few-shot adaptive gaze estimation. In: ICCV (2019)
Park, S., Spurr, A., Hilliges, O.: Deep pictorial gaze estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 741–757. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_44
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: NeurIPS (2019)
Recasens, A., Khosla, A., Vondrick, C., Torralba, A.: Where are they looking? In: NeurIPS (2015)
Recasens, A., Vondrick, C., Khosla, A., Torralba, A.: Following gaze in video. In: ICCV (2017)
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)
Shen, C., Zhao, Q.: Webpage saliency. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 33–46. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10584-0_3
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: CVPR (2017)
Sugano, Y., Matsushita, Y., Sato, Y.: Appearance-based gaze estimation using visual saliency. TPAMI 35, 329–341 (2013)
Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: CVPR (2014)
Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset. In: CVPR (2020)
Sun, Y., Zeng, J., Shan, S., Chen, X.: Cross-encoder for unsupervised gaze representation learning. In: ICCV (2021)
Wang, J., Olson, E.: AprilTag 2: efficient and robust fiducial detection. In: IROS (2016)
Wang, W., Shen, J., Xie, J., Cheng, M.M., Ling, H., Borji, A.: Revisiting video saliency prediction in the deep learning era. TPAMI 43, 220–237 (2021)
Wood, E., Baltrusaitis, T., Zhang, X., Sugano, Y., Robinson, P., Bulling, A.: Rendering of eyes for eye-shape registration and gaze estimation. In: ICCV (2015)
Wu, T., Martelaro, N., Stent, S., Ortiz, J., Ju, W.: Learning when agents can talk to drivers using the INAGT dataset and multisensor fusion. ACM Interact. Mob. Wearable Ubiquit. Technol. 5, 1–28 (2021)
Xia, Y., Zhang, D., Kim, J., Nakayama, K., Zipser, K., Whitney, D.: Predicting driver attention in critical situations. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11365, pp. 658–674. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20873-8_42
Yarbus, A.L.: Eye Movements and Vision. Springer, New York (2013). https://doi.org/10.1007/978-1-4899-5379-7
Yu, Y., Odobez, J.M.: Unsupervised representation learning for gaze estimation. In: CVPR (2020)
Zhang, X., Park, S., Beeler, T., Bradley, D., Tang, S., Hilliges, O.: ETH-XGaze: a large scale dataset for gaze estimation under extreme head pose and gaze variation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 365–381. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_22
Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: Appearance-based gaze estimation in the wild. In: CVPR (2015)
Zheng, Q., Jiao, J., Cao, Y., Lau, R.W.H.: Task-driven webpage saliency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 300–316. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_18
Acknowledgement
This research is based on work supported by Toyota Research Institute and the NSF under IIS #1846031. The views and conclusions contained herein are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the sponsors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kasahara, I., Stent, S., Park, H.S. (2022). Look Both Ways: Self-supervising Driver Gaze Estimation and Road Scene Saliency. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13673. Springer, Cham. https://doi.org/10.1007/978-3-031-19778-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-19778-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19777-2
Online ISBN: 978-3-031-19778-9
eBook Packages: Computer ScienceComputer Science (R0)