Gaze Estimation in the 3D Space Using RGB-D Sensors

Funes-Mora, Kenneth A.; Odobez, Jean-Marc

doi:10.1007/s11263-015-0863-4

Gaze Estimation in the 3D Space Using RGB-D Sensors

Towards Head-Pose and User Invariance

Published: 13 November 2015

Volume 118, pages 194–216, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Kenneth A. Funes-Mora^1,2 &
Jean-Marc Odobez^1,2

1539 Accesses
41 Citations
Explore all metrics

Abstract

We address the problem of 3D gaze estimation within a 3D environment from remote sensors, which is highly valuable for applications in human–human and human–robot interactions. To the contrary of most previous works, which are limited to screen gazing applications, we propose to leverage the depth data of RGB-D cameras to perform an accurate head pose tracking, acquire head pose invariance through a 3D rectification process that renders head pose dependent eye images into a canonical viewpoint, and computes the line-of-sight in the 3D space. To address the low resolution issue of the eye image resulting from the use of remote sensors, we rely on the appearance based gaze estimation paradigm, which has demonstrated robustness against this factor. In this context, we do a comparative study of recent appearance based strategies within our framework, study the generalization of these methods to unseen individual, and propose a cross-user eye image alignment technique relying on the direct registration of gaze-synchronized eye images. We demonstrate the validity of our approach through extensive gaze estimation experiments on a public dataset as well as a gaze coding task applied to natural job interviews.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

www.faceshift.com.
Except when combined with local binary patterns, although the gain in accuracy was negligible: 0.02\(^{\circ }\).
https://www.idiap.ch/dataset/eyediap.
This is an educated estimation. Location errors for the ball target or the screen dot center is considered as 0, but we needed to add the depth uncertainties or calibration errors. For the eyeball center, we evaluated the error by comparing in a few frames the manual annotation of the eyeball center with the projection of \({\mathbf o}^{{\mathbf {\scriptscriptstyle WCS}}}\).
In a given application, training data would need to be collected appropriately.
For the SVR methods we limited the training set to 1200 samples as using the full set was prohibitively slow.
With the slight difference that the evaluation is conducted on all samples of the test subject’s session, instead of only the second half in the person-specific case.

References

Amberg, B., Romdhani, S., & Vetter T (2007). Optimal Step Nonrigid ICP Algorithms for Surface Registration. In: IEEE Computer Vision and Pattern Recognition, pp 1–8. doi:10.1109/CVPR.2007.383165.
Amberg, B., Knothe, R., & Vetter, T. (2008). Expression invariant 3D face recognition with a Morphable Model. In: Int. Conf. on Automatic Face and Gesture Recognition, IEEE, (pp. 1–6)
Baltrusaitis, T., Robinson, P., & Morency, L. P. (2012). 3D Constrained local model for rigid and non-rigid facial tracking. In Computer vision and pattern recognition (CVPR). Providence, RI.
Baluja, S., & Pomerleau, D. (1994). Non-Intrusive Gaze Tracking Using Artificial Neural Networks. Tech. rep., CMU.
Barron, J., & Malik, J. (2013). Shape, Illumination, and Reflectance from Shading. Tech. Rep. UCB/EECS-2013-117, University of California, Berkeley.
Cao, X., Wei, Y., Wen, F., & Sun, J. (2013). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.
Article MathSciNet Google Scholar
Choi, D. H., Jang, I. H., Kim, M. H., & Kim, N. C. (2007). Color image enhancement based on single-scale retinex with a JND-based nonlinear filter. In ISCAS (pp. 3948–3951). IEEE.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Computer vision and pattern recognition (CVPR), vol. 2 (pp. 886–893).
Dantone, M., Gall, J., Fanelli, G., & Gool, L.V. (2012). Real-time Facial Feature Detection using Conditional Regression Forests. In: CVPR.
Egger, B., Schonborn, S., Forster, A., & Vetter, T. (2014). Pose Normalization for Eye Gaze Estimation and Facial Attribute Description from Still Images. In: German Conference on Pattern Recognition.
Fanelli, G., Weise, T., Gall, J., & Gool, L.V. (2011). Real Time Head Pose Estimation from Consumer Depth Cameras. In: Symposium of the German Association for Pattern Recognition (DAGM).
Funes Mora, K.A., & Odobez, J.M. (2012). Gaze Estimation From Multimodal Kinect Data. In: Computer Vision and Pattern Recognition, Workshop on Gesture Recognition, pp 25–30.
Funes Mora, K.A., & Odobez, J.M. (2013). Person Independent 3D Gaze Estimation From Remote RGB-D Cameras. In: IEEE Int. Conf. on Image Processing.
Funes Mora, K. A., Nguyen, L. S., Gatica-Perez, D., & Odobez, J. M. (2013). A Semi-Automated System for Accurate Gaze Coding in Natural Dyadic Interactions. In: Int. Conf. on Multimodal Interaction.
Funes Mora, K. A., Monay, F., & Odobez, J. M. (2014). EYEDIAP: A Database for the Development and Evaluation of Gaze Estimation Algorithms from RGB and RGB-D Cameras. In: Symposium on Eye tracking Research & Applications.
Guestrin, E. D., & Eizenman, M. (2006). General theory of remote gaze estimation using the pupil center and corneal reflections. Transactions on Bio-medical Engineering, 53(6), 1124–1133.
Article Google Scholar
Hager, G. D., & Belhumeur, P. N. (1998). Efficient region tracking with parametric models of geometry and illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10), 1025–1039.
Article Google Scholar
Hansen, D. W., & Ji, Q. (2010). In the eye of the beholder: A survey of models for eyes and gaze. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 478–500.
Article Google Scholar
Herrera, C. D., Kannala, J., & Heikkilä, J. (2012). Joint depth and color camera calibration with distortion correction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(10), 2058–2064.
Article Google Scholar
Ishikawa, T., Baker, S., Matthews, I., & Kanade, T. (2004). Passive Driver Gaze Tracking with Active Appearance Models. In: Proc. World Congress on Intelligent Transportation Systems, pp 1–12.
Jianfeng, L., & Shigang, L. (2014). Eye-Model-Based Gaze Estimation by RGB-D Camera. In: Computer Vision and Pattern Recognition Workshops, pp 606–610.
Kazemi, V., & Sullivan, J. (2014). One Millisecond Face Alignment with an Ensemble of Regression Trees. In: Computer Vision and Pattern Recognition (CVPR).
Li, D., Winfield, D., & Parkhurst, D.J. (2005). Starburst: A hybrid algorithm for video-based eye tracking combining feature-based and model-based approaches. In Computer vision and pattern recognition workshops, Vol. 3 (p. 79). IEEE.
Low, K. (2004). Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration. Tech. Rep. February, University of North Carolina at Chapel Hill.
Lu, F., Sugano, Y., Takahiro, O., Sato, Y., & Okabe, T. (2011). Inferring human gaze from appearance via adaptive linear regression. In International conference on computer vision (ICCV).
Lu, F., Sugano, Y., Okabe, T., & Sato, Y. (2012). Head pose-free appearance-based gaze sensing via eye image synthesis. In: IEEE International Conference on Pattern Recognition.
Lu, F., Okabe, T., Sugano, Y., & Sato, Y. (2014a). Learning gaze biases with head motion for head pose-free gaze estimation. Image and Vision Computing, 32(3), 169–179.
Article Google Scholar
Lu, F., Sugano, Y., Okabe, T., & Sato, Y. (2014b). Adaptive linear regression for appearance-based gaze estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(10), 2033–2046.
Article Google Scholar
Martinez, F., Carbone, A., & Pissaloux, E. (2012). Gaze estimation using local features and non-linear regression. In: Int. Conf. on Image Processing, pp 1961–1964.
Moriyama, T., & Cohn, J. (2004). Meticulously detailed eye model and its application to analysis of facial image. In International Conference on Systems, Man and Cybernetics, vol. 1 (pp. 629–634).
Murphy-Chutorian, E., & Trivedi, M. (2008). Head pose estimation in computer vision: A survey. In IEEE transaction on pattern analysis and machine intelligence.
Nguyen, L. S., Frauendorfer, D., Schmid Mast, M., & Gatica-Perez, D. (2014). Hire Me: Computational inference of hirability in employment interviews based on nonverbal behavior. IEEE Transactions on Multimedia.
Noris, B., Benmachiche, K., & Billard, A. (2008). Calibration-Free Eye Gaze Direction Detection with Gaussian Processes. In: International Conference on Computer Vision Theory and Applications, pp 611–616.
Noris, B., Keller, J. B., & Billard, A. (2011). A wearable gaze tracking system for children in unconstrained environments. Computer Vision and Image Understanding, 115(4), 476–486.
Article Google Scholar
Oertel, C., Funes Mora, K. A., Sheikhi, S., Odobez, J.M., & Gustafson, J. (2014). Who Will Get the Grant? A Multimodal Corpus for the Analysis of Conversational Behaviours in Group Interviews. In: International Conference on Multimodal Interaction, Understanding and Modeling Multiparty, Multimodal Interactions Workshop.
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3D Face model for pose and illumination invariant face recognition. In Proceedings of advanced video and signal based surveillance. IEEE.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
MathSciNet MATH Google Scholar
Schneider, T., Schauerte, B., & Stiefelhagen, R. (2014). Manifold Alignment for Person Independent Appearance-based Gaze Estimation. In: International Conference on Pattern Recognition (ICPR), IEEE.
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake, A. (2011). Real-time human pose recognition in parts from single depth images. In Computer vision and pattern recognition (CVPR) (pp. 1297–1304).
Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199–222.
Article MathSciNet Google Scholar
Sugano, Y., Matsushita, Y., Sato, Y., & Koike, H. (2008). An incremental learning method for unconstrained gaze estimation. In: ECCV, Springer, pp 656–667.
Sugano, Y., Matsushita, Y., & Sato, Y. (2014). Learning-by-Synthesis for Appearance-based 3D Gaze Estimation. In: Computer Vision and Pattern Recognition (CVPR), IEEE.
Tan, K. H., Kriegman, D. J., & Ahuja, N. (2002). Appearance-based Eye Gaze Estimation. In: IEEE Workshop on Applications of Computer Vision, pp 191—-
Timm, F., & Barth, E. (2011). Accurate Eye Centre Localisation by Means of Gradients. In: Int. Conf. on Computer Vision Theory and Applications, pp 125–130.
Valenti, R., & Gevers, T. (2012). Accurate eye center location through invariant isocentric patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1785–1798.
Article Google Scholar
Viola, P., & Jones, M. (2001). Robust Real-time object detection. International Journal of Computer Vision, 4, 51–52.
Weise, T., Bouaziz, S., Li, H., & Pauly, M. (2011). Realtime performance-based facial animation. ACM Transactions on Graphics (SIGGRAPH), 30(4), 1.
Article Google Scholar
Williams, O., Blake, A., & Cipolla, R. (2006). Sparse and semi-supervised visual mapping with the S3GP. In: Computer Vision and Pattern Recognition, pp 230–237.
Xiong, X., Liu, Z., Cai, Q., & Zhang, Z. (2014). Eye Gaze Tracking Using an RGBD Camera: A Comparison with a RGB Solution. In: International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp 1113–1121.
Yamazoe, H., Utsumi, A., Yonezawa, T., & Abe, S. (2008). Remote gaze estimation with a single camera based on facial-feature tracking without special calibration actions. Symposium on eye tracking research & applications, ACM, vol. 1, 245–250.
Yuille, A. L., Hallinan, P. W., & Cohen, D. S. (1992). Feature extraction from faces using deformable templates. International Journal of Computer Vision, 8(2), 99–111.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Idiap Research Institute, Martigny, Switzerland
Kenneth A. Funes-Mora & Jean-Marc Odobez
École polytechnique fédéral de, Lausanne (EPFL), Switzerland
Kenneth A. Funes-Mora & Jean-Marc Odobez

Authors

Kenneth A. Funes-Mora
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marc Odobez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kenneth A. Funes-Mora.

Additional information

Communicated by M. Hebert.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 140 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Funes-Mora, K.A., Odobez, JM. Gaze Estimation in the 3D Space Using RGB-D Sensors. Int J Comput Vis 118, 194–216 (2016). https://doi.org/10.1007/s11263-015-0863-4

Download citation

Received: 19 November 2014
Accepted: 23 September 2015
Published: 13 November 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11263-015-0863-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gaze Estimation in the 3D Space Using RGB-D Sensors

Abstract

Access this article

Similar content being viewed by others

A3D: A Device for Studying Gaze in 3D

FACTS - A Computer Vision System for 3D Recovery and Semantic Mapping of Human Factors

Generating accurate 3D gaze vectors using synchronized eye tracking and motion capture

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 140 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Gaze Estimation in the 3D Space Using RGB-D Sensors

Abstract

Access this article

Similar content being viewed by others

A3D: A Device for Studying Gaze in 3D

FACTS - A Computer Vision System for 3D Recovery and Semantic Mapping of Human Factors

Generating accurate 3D gaze vectors using synchronized eye tracking and motion capture

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 140 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation