Skip to main content
Log in

Unconstrained human gaze estimation approach for medium-distance scene based on monocular vision

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

This paper proposes an unconstrained human gaze estimation approach for medium-distance scene based on monocular vision. A recurrent convolution neural network for gaze estimation is designed to construct the mapping relationship from the face image to the 3D gaze vector in the camera space, so as to complete the estimation of the human gaze. Firstly, based on the VICON system and a monocular camera, the face image sequence and gaze vector sequence of human gaze behavior are collected in different situations of medium-distance scenes to construct the time-synchronized gaze estimation dataset. Secondly, the recurrent convolutional gaze estimation neural network is designed, and the head pose feature and eye appearance feature are extracted from single frame image with monocular vision image sequences as input. Thirdly, the extracted gaze features are fused in the spatial dimension and the temporal dimension to estimate the direction of the human gaze. Finally, the cross-validation and experimental evaluation of the estimation errors of the constructed gaze estimation model are carried out. The results show that the Cross-Person experimental average error is 7.65°, and the Person-Specific experimental error is 3.88°. Compared with the Gaze360 method, the proposed approach reduces the average evaluation error of the gaze angle by 5.0% in the Cross-Person experiments and 16.6% with the Person-Specific experiments. These experiments verify the effectiveness of the proposed approach, and it has better generalization ability and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability statements

The datasets generated during and/or analyzed during the current study are available in the 360 repository, https://yunpan.360.cn/surl_y2MuMp85yMD.

References

  1. Huang, J., Zhang, Z., Xie, G., et al.: Real-time precise human-computer interaction system based on gaze estimation and tracking. Wirel. Commun. Mob. Comput. 2021, 8213946 (2021)

    Article  Google Scholar 

  2. Harezlak, K., Kasprowski, P.: Application of eye tracking in medicine: a survey, research issues and challenges. Comput. Med. Imaging Graph.: Offic. J. Comput. Med. Imaging Soc. 65, 176–190 (2018)

    Article  Google Scholar 

  3. Yoon, H.S., Hong, H.G., Lee, D.E., et al.: Driver’s eye-based gaze tracking system by one-point calibration. Multimedia Tools Appl. 78(6), 7155–7179 (2019)

    Article  Google Scholar 

  4. Cazzato, D., Leo, M., Distante, C., Voos, H.: When I look into your eyes: a survey on computer vision contributions for human gaze estimation and tracking. Sensors 20(13), 3739 (2020)

    Article  Google Scholar 

  5. Krafka, K., Khosla, A., Kellnhofer, P., et al.: Eye tracking for everyone. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2176–2184 (2016)

  6. Zhang, X., Sugano, Y., Fritz, M., et al.: MPIIGaze: real-world dataset and deep appearance-based gaze estimation. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 162–175 (2019)

    Article  Google Scholar 

  7. Yang, A., Lu, W., Naeem, W., et al.: A sequence models-based real-time multi-person action recognition method with monocular vision. J. Ambient Intell. Hum. Comput. (2021). https://doi.org/10.1007/s12652-021-03399-z

  8. Morimoto, C., Mimica, M.: Eye gaze tracking techniques for interactive applications. Comput. Vis. Image Underst. 98(1), 4–24 (2005)

    Article  Google Scholar 

  9. Ranjan, R., De Mello, S., Kautz, J.: Light-weight head pose invariant gaze tracking. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2237–22378 (2018)

  10. Sesma-Sanchez, L., Villanueva, A., Cabeza, R.: Gaze estimation interpolation methods based on binocular data. IEEE Trans. Biomed. Eng. 59(8), 2235–2243 (2012)

    Article  Google Scholar 

  11. Li, J., Li, S.: Gaze estimation from color image based on the eye model with known head pose. IEEE Trans. Hum.-Mach. Syst. 46(3), 414–423 (2016)

    Article  Google Scholar 

  12. Lu, H., Chao, W., Chen, Y.: Gaze tracking by binocular vision and LBP features. In: 2008 International Conference on Pattern Recognition (ICPR), pp. 1–4 (2008)

  13. Huang, Q., Veeraraghavan, A., Sabharwal, A.: TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Mach. Vis. Appl. 28, 445–461 (2017)

    Article  Google Scholar 

  14. Zhang, X., Sugano, Y., Fritz, M., et al.: Appearance-based gaze estimation in the wild. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4511–4520 (2015)

  15. Cheng, Y., Zhang, X., Lu, F., et al.: Gaze estimation by exploring two-eye asymmetry. IEEE Trans. Image Process. 29, 5259–5272 (2020)

    Article  Google Scholar 

  16. Zhou, X., Jiang, J., Liu, Q., et al.: Learning a 3D gaze estimator with adaptive weighted strategy. IEEE Access 8, 82142–82152 (2020)

    Article  Google Scholar 

  17. Liu, G., Yu, Y., Mora, K.A.F., et al.: A differential approach for gaze estimation. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 1092–1099 (2021)

    Article  Google Scholar 

  18. Cheng, Y., Huang, S., Wang, F., et al.: A coarse-to-fine adaptive network for appearance-based gaze estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-20), pp. 10623–10630 (2020)

  19. Cheng, Y., Lu, F.: Gaze estimation using transformer. arXiv:2105.14424 (2021)

  20. Cheng, Y., Bao, Y., Lu, F.: Puregaze: purifying gaze feature for generalizable gaze estimation. In: 2022 Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-22), pp. 436–443 (2022)

  21. Tsai, R.Y., Lenz, R.K.: A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans. Robot. Autom. 5(3), 345–358 (1989)

    Article  Google Scholar 

  22. Baltrušaitis, T., Robinson, P., Morency, L. P.: Continuous conditional neural fields for structured regression. In: 2014 European Conference on Computer Vision, pp. 593–608 (2014)

  23. Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vision 81, 155–166 (2009)

    Article  Google Scholar 

  24. Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1821–1828 (2014)

  25. Bao, J., Liu, B., Yu, J.: An Individual-difference-aware model for cross-person gaze estimation. IEEE Trans. Image Process. 31, 3322–3333 (2022)

    Article  Google Scholar 

  26. Kellnhofer, P., Adria Recasens A., et al.: Gaze360: physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6912–6921 (2019)

Download references

Acknowledgments

This work is supported by Natural Science Foundation of Shanghai under Grant 22ZR1424200.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aolei Yang.

Ethics declarations

Conflict of interest

All authors declared that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (MP4 4184 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, A., Jin, Z., Guo, S. et al. Unconstrained human gaze estimation approach for medium-distance scene based on monocular vision. Vis Comput 40, 73–85 (2024). https://doi.org/10.1007/s00371-022-02766-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02766-x

Keywords

Navigation