Abstract
This paper presents the first realtime 3D eye gaze capture method that simultaneously captures the coordinated movement of 3D eye gaze, head poses and facial expression deformation using a single RGB camera. Our key idea is to complement a realtime 3D facial performance capture system with an efficient 3D eye gaze tracker. We start the process by automatically detecting important 2D facial features for each frame. The detected facial features are then used to reconstruct 3D head poses and large-scale facial deformation using multi-linear expression deformation models. Next, we introduce a novel user-independent classification method for extracting iris and pupil pixels in each frame. We formulate the 3D eye gaze tracker in the Maximum A Posterior (MAP) framework, which sequentially infers the most probable state of 3D eye gaze at each frame. The eye gaze tracker could fail when eye blinking occurs. We further introduce an efficient eye close detector to improve the robustness and accuracy of the eye gaze tracker. We have tested our system on both live video streams and the Internet videos, demonstrating its accuracy and robustness under a variety of uncontrolled lighting conditions and overcoming significant differences of races, genders, shapes, poses and expressions across individuals.
Supplemental Material
Available for Download
Supplemental files.
- Anon, Applied science laboratories, 2015. http://www.as-l.com.com.Google Scholar
- Baltrušaitis, T., Robinson, P., and Morency, L.-P. 2012. 3d constrained local model for rigid and non-rigid facial tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2610--2617. Google ScholarDigital Library
- Beeler, T., Bickel, B., Beardsley, P., Sumner, B., and Gross, M. 2010. High-quality single-shot capture of facial geometry. ACM Trans. Graph. 29, 4, 40:1--40:9. Google ScholarDigital Library
- Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM Trans. Graph. 30, 4, 75:1--75:10. Google ScholarDigital Library
- Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M., Pfister, H., and Gross, M. 2007. Multi-scale capture of facial geometry and motion. ACM Trans. Graph. 26, 3, 33:1--33:10. Google ScholarDigital Library
- BioID, 2015. https://www.bioid.com/about/bioid-face-database.Google Scholar
- Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM Trans. Graph. 32, 4 (July), 40:1--40:10. Google ScholarDigital Library
- Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM Trans. Graph. 29, 4, 41:1--41:10. Google ScholarDigital Library
- Breiman, L. 2001. Random forests. Machine learning 45, 1, 5--32. Google ScholarDigital Library
- Canny, J. 1986. A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 679--698. Google ScholarDigital Library
- Cao, X., Wei, Y., Wen, F., and Sun, J. 2012. Face alignment by explicit shape regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2887--2894. Google ScholarDigital Library
- Cao, C., Weng, Y., Lin, S., and Zhou, K. 2013. 3d shape regression for real-time facial animation. ACM Trans. Graph. 32, 4 (July), 41:1--41:10. Google ScholarDigital Library
- Cao, C., Hou, Q., and Zhou, K. 2014. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on Graphics (TOG) 33, 4, 43. Google ScholarDigital Library
- Cao, C., Weng, Y., Zhou, S., Tong, Y., and Zhou, K. 2014. Facewarehouse: a 3d facial expression database for visual computing. IEEE Transations on Visualizationa and Computer Graphics (TVCG) 20, 3, 413--425. Google ScholarDigital Library
- Cao, C., Bradley, D., Zhou, K., and Beeler, T. 2015. Realtime high-fidelity facial performance capture. ACM Transactions on Graphics (TOG) 34, 4, 46. Google ScholarDigital Library
- Chai, J., Xiao, J., and Hodgins, J. 2003. Vision-based control of 3D facial animation. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation. 193--206. Google ScholarDigital Library
- Chau, M., and Betke, M. 2005. Real time eye tracking and blink detection with usb cameras. Tech. rep., Boston University Computer Science Department.Google Scholar
- Chen, Y.-L., Wu, H.-t., Shi, F., Tong, X., and Chai, J. 2013. Accurate and robust 3d facial capture using a single rgbd camera. In IEEE International Conference on Computer Vision (ICCV), 3615--3622. Google ScholarDigital Library
- Comaniciu, D., and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 5, 603--619. Google ScholarDigital Library
- Corcoran, P. M., Nanu, F., Petrescu, S., and Bigioi, P. 2012. Real-time eye gaze tracking for gaming design and consumer electronics systems. IEEE Transactions on Consumer Electronics 58, 2, 347--355.Google ScholarCross Ref
- Garrido, P., Valgaerts, L., Wu, C., and Theobalt, C. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM Trans. Graph. 32, 6, 158. Google ScholarDigital Library
- Hsieh, P.-L., Ma, C., Yu, J., and Li, H. 2015. Unconstrained realtime facial performance capture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1675--1683.Google Scholar
- Huang, W., and Mariani, R. 2000. Face detection and precise eyes location. In International Conference on Pattern Recognition, vol. 4, 722--727. Google ScholarDigital Library
- Huang, J., and Wechsler, H. 1999. Eye detection using optimal wavelet packets and radial basis functions (rbfs). International Journal of Pattern Recognition and Artificial Intelligence 13, 07, 1009--1025.Google ScholarCross Ref
- Huang, G. B., Ramesh, M., Berg, T., and Learned-Miller, E. 2007. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. rep., Technical Report 07-49, University of Massachusetts, Amherst.Google Scholar
- Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Leveraging motion capture and 3d scanning for high-fidelity facial performance acquisition. ACM Trans. Graph. 30, 4, 74:1--74:10. Google ScholarDigital Library
- Kawato, S., and Ohya, J. 2000. Real-time detection of nodding and head-shaking by directly detecting and tracking the between-eyes. In IEEE International Conference on Automatic Face and Gesture Recognition, 40--45. Google ScholarDigital Library
- Lc Technologies, 2015. http://www.eyegaze.com.Google Scholar
- Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM Trans. Graph. 32, 4 (July), 42:1--42:10. Google ScholarDigital Library
- Li, H., Trutoiu, L., Olszewski, K., Wei, L., Trutna, T., Hsieh, P.-L., Nicholls, A., and Ma, C. 2015. Facial performance sensing head-mounted display. ACM Transactions on Graphics (TOG) 34, 4, 47. Google ScholarDigital Library
- Liu, Y., Xu, F., Chai, J., Tong, X., Wang, L., and Huo, Q. 2015. Video-audio driven real-time facial animation. ACM Trans. Graph. 34, 6 (Oct.), 182:1--182:10. Google ScholarDigital Library
- Ma, W.-C., Jones, A., Chiang, J.-Y., Hawkins, T., Frederiksen, S., Peers, P., Vukovic, M., Ouhyoung, M., and Debevec, P. 2008. Facial performance synthesis using deformation-driven polynomial displacement maps. ACM Trans. Graph. 27, 5, 121:1--121:10. Google ScholarDigital Library
- Megvii Technology, 2015. http://www.faceplusplus.com.cn.Google Scholar
- Morimoto, C. H., and Flickner, M. 2000. Real-time multiple face detection using active illumination. In IEEE International Conference on Automatic Face and Gesture Recognition, 8--13. Google ScholarDigital Library
- Ren, S., Cao, X., Wei, Y., and Sun, J. 2014. Face alignment at 3000 fps via regressing local binary features. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 1685--1692. Google ScholarDigital Library
- Ruhland, K., Andrist, S., Badler, J., Peters, C., Badler, N., Gleicher, M., Mutlu, B., and Mcdonnell, R. 2014. Look me in the eyes: A survey of eye and gaze animation for virtual agents and artificial systems. In Eurographics State-of-the-Art Report, 69--91.Google Scholar
- Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Realtime avatar animation from a single image. In IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011), IEEE, 117--124.Google Scholar
- Shi, F., Wu, H.-T., Tong, X., and Chai, J. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Transactions on Graphics (TOG) 33, 6, 222. Google ScholarDigital Library
- Sorkine, O., Cohen-Or, D., Lipman, Y., Alexa, M., Rössl, C., and Seidel, H.-P. 2004. Laplacian surface editing. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, ACM, 175--184. Google ScholarDigital Library
- Sugano, Y., Matsushita, Y., and Sato, Y. 2014. Learning-by-synthesis for appearance-based 3d gaze estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1821--1828. Google ScholarDigital Library
- Tian, Y.-l., Kanade, T., and Cohn, J. F. 2000. Dual-state parametric eye tracking. In IEEE International Conference on Automatic Face and Gesture Recognition, 110--115. Google ScholarDigital Library
- Tobii Technologies, 2015. http://www.tobii.com.Google Scholar
- Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM Trans. Graph. 31, 6 (Nov.), 187:1--187:11. Google ScholarDigital Library
- Vlasic, D., Brand, M., Pfister, H., and Popović, J. 2005. Face transfer with multilinear models. In ACM Transactions on Graphics (TOG), vol. 24, ACM, 426--433. Google ScholarDigital Library
- Weise, T., Li, H., Van Gool, L., and Pauly, M. 2009. Face/off: live facial puppetry. In Symposium on Computer Animation, 7--16. Google ScholarDigital Library
- Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM Trans. Graph. 30, 4, 77:1--77:10. Google ScholarDigital Library
- Wood, E., Baltrusaitis, T., Zhang, X., Sugano, Y., Robinson, P., and Bulling, A. 2015. Rendering of eyes for eye-shape registration and gaze estimation. arXiv preprint arXiv:1505.05916.Google Scholar
- Xiong, X., and De la Torre, F. 2013. Supervised descent method and its applications to face alignment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 532--539. Google ScholarDigital Library
- Zhang, L., Snavely, N., Curless, B., and Seitz, S. 2004. Spacetime faces: high resolution capture for modeling and animation. ACM Transactions on Graphics 23, 3, 548--558. Google ScholarDigital Library
- Zhang, X., Sugano, Y., Fritz, M., and Bulling, A. 2015. Appearance-based gaze estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4511--4520.Google Scholar
Index Terms
- Realtime 3D eye gaze animation using a single RGB camera
Recommendations
Eye gaze tracking using an RGBD camera: a comparison with a RGB solution
UbiComp '14 Adjunct: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct PublicationMost commercial eye gaze tracking systems are based on the use of infrared lights. However, such systems may not work outdoor or may have a very limited head box for them to work. This paper proposes a non-infrared based approach to track one's eye gaze ...
Eye gaze tracking with free head movements using a single camera
SoICT '10: Proceedings of the 1st Symposium on Information and Communication TechnologyThe problem of eye gaze tracking has been researched and developed for a long time. The most difficult problem in the non-intrusive system of eye gaze tracking is the problem of head movements. Some of existing methods have to use two cameras and an ...
Eye-Model-Based Gaze Estimation by RGB-D Camera
CVPRW '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition WorkshopsThis paper proposes a method of eye-model-based gaze estimation by RGB-D camera, Kinect sensor. Different from other methods, our method sets up a model to calibrate the eyeball center by gazing at a target in 3D space, not predefined. And then by ...
Comments