Abstract
This paper proposes an unconstrained human gaze estimation approach for medium-distance scene based on monocular vision. A recurrent convolution neural network for gaze estimation is designed to construct the mapping relationship from the face image to the 3D gaze vector in the camera space, so as to complete the estimation of the human gaze. Firstly, based on the VICON system and a monocular camera, the face image sequence and gaze vector sequence of human gaze behavior are collected in different situations of medium-distance scenes to construct the time-synchronized gaze estimation dataset. Secondly, the recurrent convolutional gaze estimation neural network is designed, and the head pose feature and eye appearance feature are extracted from single frame image with monocular vision image sequences as input. Thirdly, the extracted gaze features are fused in the spatial dimension and the temporal dimension to estimate the direction of the human gaze. Finally, the cross-validation and experimental evaluation of the estimation errors of the constructed gaze estimation model are carried out. The results show that the Cross-Person experimental average error is 7.65°, and the Person-Specific experimental error is 3.88°. Compared with the Gaze360 method, the proposed approach reduces the average evaluation error of the gaze angle by 5.0% in the Cross-Person experiments and 16.6% with the Person-Specific experiments. These experiments verify the effectiveness of the proposed approach, and it has better generalization ability and robustness.
Similar content being viewed by others
Data availability statements
The datasets generated during and/or analyzed during the current study are available in the 360 repository, https://yunpan.360.cn/surl_y2MuMp85yMD.
References
Huang, J., Zhang, Z., Xie, G., et al.: Real-time precise human-computer interaction system based on gaze estimation and tracking. Wirel. Commun. Mob. Comput. 2021, 8213946 (2021)
Harezlak, K., Kasprowski, P.: Application of eye tracking in medicine: a survey, research issues and challenges. Comput. Med. Imaging Graph.: Offic. J. Comput. Med. Imaging Soc. 65, 176–190 (2018)
Yoon, H.S., Hong, H.G., Lee, D.E., et al.: Driver’s eye-based gaze tracking system by one-point calibration. Multimedia Tools Appl. 78(6), 7155–7179 (2019)
Cazzato, D., Leo, M., Distante, C., Voos, H.: When I look into your eyes: a survey on computer vision contributions for human gaze estimation and tracking. Sensors 20(13), 3739 (2020)
Krafka, K., Khosla, A., Kellnhofer, P., et al.: Eye tracking for everyone. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2176–2184 (2016)
Zhang, X., Sugano, Y., Fritz, M., et al.: MPIIGaze: real-world dataset and deep appearance-based gaze estimation. IEEE Trans. Pattern Anal. Mach. Intell. 41(1), 162–175 (2019)
Yang, A., Lu, W., Naeem, W., et al.: A sequence models-based real-time multi-person action recognition method with monocular vision. J. Ambient Intell. Hum. Comput. (2021). https://doi.org/10.1007/s12652-021-03399-z
Morimoto, C., Mimica, M.: Eye gaze tracking techniques for interactive applications. Comput. Vis. Image Underst. 98(1), 4–24 (2005)
Ranjan, R., De Mello, S., Kautz, J.: Light-weight head pose invariant gaze tracking. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2237–22378 (2018)
Sesma-Sanchez, L., Villanueva, A., Cabeza, R.: Gaze estimation interpolation methods based on binocular data. IEEE Trans. Biomed. Eng. 59(8), 2235–2243 (2012)
Li, J., Li, S.: Gaze estimation from color image based on the eye model with known head pose. IEEE Trans. Hum.-Mach. Syst. 46(3), 414–423 (2016)
Lu, H., Chao, W., Chen, Y.: Gaze tracking by binocular vision and LBP features. In: 2008 International Conference on Pattern Recognition (ICPR), pp. 1–4 (2008)
Huang, Q., Veeraraghavan, A., Sabharwal, A.: TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets. Mach. Vis. Appl. 28, 445–461 (2017)
Zhang, X., Sugano, Y., Fritz, M., et al.: Appearance-based gaze estimation in the wild. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4511–4520 (2015)
Cheng, Y., Zhang, X., Lu, F., et al.: Gaze estimation by exploring two-eye asymmetry. IEEE Trans. Image Process. 29, 5259–5272 (2020)
Zhou, X., Jiang, J., Liu, Q., et al.: Learning a 3D gaze estimator with adaptive weighted strategy. IEEE Access 8, 82142–82152 (2020)
Liu, G., Yu, Y., Mora, K.A.F., et al.: A differential approach for gaze estimation. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 1092–1099 (2021)
Cheng, Y., Huang, S., Wang, F., et al.: A coarse-to-fine adaptive network for appearance-based gaze estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-20), pp. 10623–10630 (2020)
Cheng, Y., Lu, F.: Gaze estimation using transformer. arXiv:2105.14424 (2021)
Cheng, Y., Bao, Y., Lu, F.: Puregaze: purifying gaze feature for generalizable gaze estimation. In: 2022 Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-22), pp. 436–443 (2022)
Tsai, R.Y., Lenz, R.K.: A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans. Robot. Autom. 5(3), 345–358 (1989)
Baltrušaitis, T., Robinson, P., Morency, L. P.: Continuous conditional neural fields for structured regression. In: 2014 European Conference on Computer Vision, pp. 593–608 (2014)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vision 81, 155–166 (2009)
Sugano, Y., Matsushita, Y., Sato, Y.: Learning-by-synthesis for appearance-based 3D gaze estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1821–1828 (2014)
Bao, J., Liu, B., Yu, J.: An Individual-difference-aware model for cross-person gaze estimation. IEEE Trans. Image Process. 31, 3322–3333 (2022)
Kellnhofer, P., Adria Recasens A., et al.: Gaze360: physically unconstrained gaze estimation in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6912–6921 (2019)
Acknowledgments
This work is supported by Natural Science Foundation of Shanghai under Grant 22ZR1424200.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declared that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file1 (MP4 4184 kb)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, A., Jin, Z., Guo, S. et al. Unconstrained human gaze estimation approach for medium-distance scene based on monocular vision. Vis Comput 40, 73–85 (2024). https://doi.org/10.1007/s00371-022-02766-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-022-02766-x