Skip to main content
Log in

Gaze Estimation in the 3D Space Using RGB-D Sensors

Towards Head-Pose and User Invariance

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We address the problem of 3D gaze estimation within a 3D environment from remote sensors, which is highly valuable for applications in human–human and human–robot interactions. To the contrary of most previous works, which are limited to screen gazing applications, we propose to leverage the depth data of RGB-D cameras to perform an accurate head pose tracking, acquire head pose invariance through a 3D rectification process that renders head pose dependent eye images into a canonical viewpoint, and computes the line-of-sight in the 3D space. To address the low resolution issue of the eye image resulting from the use of remote sensors, we rely on the appearance based gaze estimation paradigm, which has demonstrated robustness against this factor. In this context, we do a comparative study of recent appearance based strategies within our framework, study the generalization of these methods to unseen individual, and propose a cross-user eye image alignment technique relying on the direct registration of gaze-synchronized eye images. We demonstrate the validity of our approach through extensive gaze estimation experiments on a public dataset as well as a gaze coding task applied to natural job interviews.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. www.faceshift.com.

  2. Except when combined with local binary patterns, although the gain in accuracy was negligible: 0.02\(^{\circ }\).

  3. https://www.idiap.ch/dataset/eyediap.

  4. This is an educated estimation. Location errors for the ball target or the screen dot center is considered as 0, but we needed to add the depth uncertainties or calibration errors. For the eyeball center, we evaluated the error by comparing in a few frames the manual annotation of the eyeball center with the projection of \({\mathbf o}^{{\mathbf {\scriptscriptstyle WCS}}}\).

  5. In a given application, training data would need to be collected appropriately.

  6. For the SVR methods we limited the training set to 1200 samples as using the full set was prohibitively slow.

  7. With the slight difference that the evaluation is conducted on all samples of the test subject’s session, instead of only the second half in the person-specific case.

References

  • Amberg, B., Romdhani, S., & Vetter T (2007). Optimal Step Nonrigid ICP Algorithms for Surface Registration. In: IEEE Computer Vision and Pattern Recognition, pp 1–8. doi:10.1109/CVPR.2007.383165.

  • Amberg, B., Knothe, R., & Vetter, T. (2008). Expression invariant 3D face recognition with a Morphable Model. In: Int. Conf. on Automatic Face and Gesture Recognition, IEEE, (pp. 1–6)

  • Baltrusaitis, T., Robinson, P., & Morency, L. P. (2012). 3D Constrained local model for rigid and non-rigid facial tracking. In Computer vision and pattern recognition (CVPR). Providence, RI.

  • Baluja, S., & Pomerleau, D. (1994). Non-Intrusive Gaze Tracking Using Artificial Neural Networks. Tech. rep., CMU.

  • Barron, J., & Malik, J. (2013). Shape, Illumination, and Reflectance from Shading. Tech. Rep. UCB/EECS-2013-117, University of California, Berkeley.

  • Cao, X., Wei, Y., Wen, F., & Sun, J. (2013). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.

    Article  MathSciNet  Google Scholar 

  • Choi, D. H., Jang, I. H., Kim, M. H., & Kim, N. C. (2007). Color image enhancement based on single-scale retinex with a JND-based nonlinear filter. In ISCAS (pp. 3948–3951). IEEE.

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer vision and pattern recognition (CVPR), vol. 2 (pp. 886–893).

  • Dantone, M., Gall, J., Fanelli, G., & Gool, L.V. (2012). Real-time Facial Feature Detection using Conditional Regression Forests. In: CVPR.

  • Egger, B., Schonborn, S., Forster, A., & Vetter, T. (2014). Pose Normalization for Eye Gaze Estimation and Facial Attribute Description from Still Images. In: German Conference on Pattern Recognition.

  • Fanelli, G., Weise, T., Gall, J., & Gool, L.V. (2011). Real Time Head Pose Estimation from Consumer Depth Cameras. In: Symposium of the German Association for Pattern Recognition (DAGM).

  • Funes Mora, K.A., & Odobez, J.M. (2012). Gaze Estimation From Multimodal Kinect Data. In: Computer Vision and Pattern Recognition, Workshop on Gesture Recognition, pp 25–30.

  • Funes Mora, K.A., & Odobez, J.M. (2013). Person Independent 3D Gaze Estimation From Remote RGB-D Cameras. In: IEEE Int. Conf. on Image Processing.

  • Funes Mora, K. A., Nguyen, L. S., Gatica-Perez, D., & Odobez, J. M. (2013). A Semi-Automated System for Accurate Gaze Coding in Natural Dyadic Interactions. In: Int. Conf. on Multimodal Interaction.

  • Funes Mora, K. A., Monay, F., & Odobez, J. M. (2014). EYEDIAP: A Database for the Development and Evaluation of Gaze Estimation Algorithms from RGB and RGB-D Cameras. In: Symposium on Eye tracking Research & Applications.

  • Guestrin, E. D., & Eizenman, M. (2006). General theory of remote gaze estimation using the pupil center and corneal reflections. Transactions on Bio-medical Engineering, 53(6), 1124–1133.

    Article  Google Scholar 

  • Hager, G. D., & Belhumeur, P. N. (1998). Efficient region tracking with parametric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10), 1025–1039.

    Article  Google Scholar 

  • Hansen, D. W., & Ji, Q. (2010). In the eye of the beholder: A survey of models for eyes and gaze. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 478–500.

    Article  Google Scholar 

  • Herrera, C. D., Kannala, J., & Heikkilä, J. (2012). Joint depth and color camera calibration with distortion correction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10), 2058–2064.

    Article  Google Scholar 

  • Ishikawa, T., Baker, S., Matthews, I., & Kanade, T. (2004). Passive Driver Gaze Tracking with Active Appearance Models. In: Proc. World Congress on Intelligent Transportation Systems, pp 1–12.

  • Jianfeng, L., & Shigang, L. (2014). Eye-Model-Based Gaze Estimation by RGB-D Camera. In: Computer Vision and Pattern Recognition Workshops, pp 606–610.

  • Kazemi, V., & Sullivan, J. (2014). One Millisecond Face Alignment with an Ensemble of Regression Trees. In: Computer Vision and Pattern Recognition (CVPR).

  • Li, D., Winfield, D., & Parkhurst, D.J. (2005). Starburst: A hybrid algorithm for video-based eye tracking combining feature-based and model-based approaches. In Computer vision and pattern recognition workshops, Vol. 3 (p. 79). IEEE.

  • Low, K. (2004). Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration. Tech. Rep. February, University of North Carolina at Chapel Hill.

  • Lu, F., Sugano, Y., Takahiro, O., Sato, Y., & Okabe, T. (2011). Inferring human gaze from appearance via adaptive linear regression. In International conference on computer vision (ICCV).

  • Lu, F., Sugano, Y., Okabe, T., & Sato, Y. (2012). Head pose-free appearance-based gaze sensing via eye image synthesis. In: IEEE International Conference on Pattern Recognition.

  • Lu, F., Okabe, T., Sugano, Y., & Sato, Y. (2014a). Learning gaze biases with head motion for head pose-free gaze estimation. Image and Vision Computing, 32(3), 169–179.

    Article  Google Scholar 

  • Lu, F., Sugano, Y., Okabe, T., & Sato, Y. (2014b). Adaptive linear regression for appearance-based gaze estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(10), 2033–2046.

    Article  Google Scholar 

  • Martinez, F., Carbone, A., & Pissaloux, E. (2012). Gaze estimation using local features and non-linear regression. In: Int. Conf. on Image Processing, pp 1961–1964.

  • Moriyama, T., & Cohn, J. (2004). Meticulously detailed eye model and its application to analysis of facial image. In International Conference on Systems, Man and Cybernetics, vol. 1 (pp. 629–634).

  • Murphy-Chutorian, E., & Trivedi, M. (2008). Head pose estimation in computer vision: A survey. In IEEE transaction on pattern analysis and machine intelligence.

  • Nguyen, L. S., Frauendorfer, D., Schmid Mast, M., & Gatica-Perez, D. (2014). Hire Me: Computational inference of hirability in employment interviews based on nonverbal behavior. IEEE Transactions on Multimedia.

  • Noris, B., Benmachiche, K., & Billard, A. (2008). Calibration-Free Eye Gaze Direction Detection with Gaussian Processes. In: International Conference on Computer Vision Theory and Applications, pp 611–616.

  • Noris, B., Keller, J. B., & Billard, A. (2011). A wearable gaze tracking system for children in unconstrained environments. Computer Vision and Image Understanding, 115(4), 476–486.

    Article  Google Scholar 

  • Oertel, C., Funes Mora, K. A., Sheikhi, S., Odobez, J.M., & Gustafson, J. (2014). Who Will Get the Grant? A Multimodal Corpus for the Analysis of Conversational Behaviours in Group Interviews. In: International Conference on Multimodal Interaction, Understanding and Modeling Multiparty, Multimodal Interactions Workshop.

  • Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3D Face model for pose and illumination invariant face recognition. In Proceedings of advanced video and signal based surveillance. IEEE.

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    MathSciNet  MATH  Google Scholar 

  • Schneider, T., Schauerte, B., & Stiefelhagen, R. (2014). Manifold Alignment for Person Independent Appearance-based Gaze Estimation. In: International Conference on Pattern Recognition (ICPR), IEEE.

  • Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Computer vision and pattern recognition (CVPR) (pp. 1297–1304).

  • Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222.

    Article  MathSciNet  Google Scholar 

  • Sugano, Y., Matsushita, Y., Sato, Y., & Koike, H. (2008). An incremental learning method for unconstrained gaze estimation. In: ECCV, Springer, pp 656–667.

  • Sugano, Y., Matsushita, Y., & Sato, Y. (2014). Learning-by-Synthesis for Appearance-based 3D Gaze Estimation. In: Computer Vision and Pattern Recognition (CVPR), IEEE.

  • Tan, K. H., Kriegman, D. J., & Ahuja, N. (2002). Appearance-based Eye Gaze Estimation. In: IEEE Workshop on Applications of Computer Vision, pp 191—-

  • Timm, F., & Barth, E. (2011). Accurate Eye Centre Localisation by Means of Gradients. In: Int. Conf. on Computer Vision Theory and Applications, pp 125–130.

  • Valenti, R., & Gevers, T. (2012). Accurate eye center location through invariant isocentric patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1785–1798.

    Article  Google Scholar 

  • Viola, P., & Jones, M. (2001). Robust Real-time object detection. International Journal of Computer Vision, 4, 51–52.

  • Weise, T., Bouaziz, S., Li, H., & Pauly, M. (2011). Realtime performance-based facial animation. ACM Transactions on Graphics (SIGGRAPH), 30(4), 1.

    Article  Google Scholar 

  • Williams, O., Blake, A., & Cipolla, R. (2006). Sparse and semi-supervised visual mapping with the S3GP. In: Computer Vision and Pattern Recognition, pp 230–237.

  • Xiong, X., Liu, Z., Cai, Q., & Zhang, Z. (2014). Eye Gaze Tracking Using an RGBD Camera: A Comparison with a RGB Solution. In: International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp 1113–1121.

  • Yamazoe, H., Utsumi, A., Yonezawa, T., & Abe, S. (2008). Remote gaze estimation with a single camera based on facial-feature tracking without special calibration actions. Symposium on eye tracking research & applications, ACM, vol. 1, 245–250.

  • Yuille, A. L., Hallinan, P. W., & Cohen, D. S. (1992). Feature extraction from faces using deformable templates. International Journal of Computer Vision, 8(2), 99–111.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenneth A. Funes-Mora.

Additional information

Communicated by M. Hebert.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 140 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Funes-Mora, K.A., Odobez, JM. Gaze Estimation in the 3D Space Using RGB-D Sensors. Int J Comput Vis 118, 194–216 (2016). https://doi.org/10.1007/s11263-015-0863-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-015-0863-4

Keywords

Navigation