Skip to main content
Log in

Visual Focus of Attention in Non-calibrated Environments using Gaze Estimation

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Estimating the focus of attention of a person highly depends on her/his gaze directionality. Here, we propose a new method for estimating visual focus of attention using head rotation, as well as fuzzy fusion of head rotation and eye gaze estimates, in a fully automatic manner, without the need for any special hardware or a priori knowledge regarding the user, the environment or the setup. Instead, we propose a system aimed at functioning under unpretending conditions, only with the usage of simple hardware, like a normal web-camera. Our system is aimed at functioning in a human-computer interaction environment, considering a person is facing a monitor with a camera adjusted on top. To this aim, we propose in this paper two novel techniques, based on local and appearance information, estimating head rotation, and we adaptively fuse them in a common framework. The system is able to recognize head rotational movement, under translational movements of the user towards any direction, without any knowledge or a-priori estimate of the user’s distance from the camera or camera intrinsic parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Notes

  1. Here, the word ’common’ is used to distinguish from different types of web-cameras, such as wide or narrow angle, or infrared.

  2. Here, saturation is used, although different color channels (or combinations) can be used

  3. \(E'_{0}\) and \(M'_{0}\) are the coordinates of \(E_{0}\) and \(M_{0}\), translated on the second frame so that \(C_{0}\) coincides with \(C_{1}\). This has been done, in order for a visual explanation of Eqs. 8 and 9 to be given.

  4. RMS is also calculated, here, as a stricter criterion than the mean absolute error (MAE), since it ’punishes’ large errors

  5. As, in HPEG dataset, no depth information is given, here, we approximated distance from camera through the area formed by LEDs positions when the subject is facing frontally

  6. The HPEG dataset is freely available at http://www.image.ece.ntua.gr/~stiast/

References

  • Aggarwal, G., Veeraraghavan, A., & Chellappa, R. (2005). 3D facial pose tracking in uncalibrated videos. In textitProceedings of the International Conference on Pattern Recognition and Machine Intelligence (PReMI) (pp. 515–520).

  • Ahlberg, J. (2001). An active model for facial feature tracking. EURASIP Journal on Applied Signal processing, 2002, 566–571.

    Article  Google Scholar 

  • Asteriadis, S., Nikolaidis, N., Pitas, I., & Pardàs, M. (2007). Detection of facial characteristics based on edge information. In textitProceedings of the Second International Conference on Computer Vision Theory and Applications (VISAPP) (vol 2, pp. 247–252). Barcelona, Spain.

  • Asteriadis, S., Nikolaidis, N., & Pitas, I. (2009a). Facial feature detection using distance vector fields. Pattern Recognition, 42(7), 1388–1398.

    Article  MATH  Google Scholar 

  • Asteriadis, S., Soufleros, D., Karpouzis, K., & Kollias, S. (2009b). A natural head pose and eye gaze dataset. In Proceedings of the International Workshop on Affective-Aware Virtual Agents and Social Robots, November 2–6, Boston, MA.

  • Asteriadis, S., Tzouveli, P., Karpouzis, K., & Kollias, S. (2009c). Estimation of behavioral user state based on eye gaze and head pose: Application in an e-learning environment. Multimedia Tools and Applications, 41(3), 469–493.

    Article  Google Scholar 

  • Asteriadis, S., Karpouzis, K., & Kollias, S. D. (2011). Robust validation of visual focus of attention using adaptive fusion of head and eye gaze patterns. In: ICCV Workshops (pp. 414–421).

  • Ba, S. O., & Odobez, J. M. (2011). Multiperson visual focus of attention from head pose and meeting contextual cues. IEEE Transactions Pattern Analysis Machine Intelligence, 33(1), 101–116.

    Article  Google Scholar 

  • Begley, S., Mallon, J., & Whelan, P. F. (2008). Removing pose from face images. In: International Symposium on Visual Computing (pp. 692–702).

  • Cascia, M. L., Sclaroff, S., & Athitsos, V. (2000). Fast, reliable head tracking under varying illumination: An approach based on robust registration of texture-mapped 3d models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 322–336.

    Article  Google Scholar 

  • Chiu, S. L. (1994). Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems, 2(3), 267–278.

    Google Scholar 

  • Cootes, T., Walker, K., & Taylor, C. (2000). View-based active appearance models. In: Fourth IEEE International Conference on Automatic Face and Gesture Recognition (pp. 227–232).

  • Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.

    Article  Google Scholar 

  • Dornaika, F., & Davoine, F. (2008). Simultaneous facial action tracking and expression recognition in the presence of head motion. International Journal of Computer Vision, 76, 257–281.

    Article  Google Scholar 

  • Fathi, A., & Mori, G. (2007). Human pose estimation using motion exemplars. In: Proceedings of IEEE International Conference on Computer Vision (pp. 1–8).

  • Gee, A., & Cipolla, R. (1994a). Determining the gaze of faces in images. Image and Vision Computing, 12, 639–647.

    Article  Google Scholar 

  • Gee, A., & Cipolla, R. (1994b). Non-intrusive gaze tracking for human-computer interaction. Proceedings of the International Conference on Mechatronics and Machine Vision in Practice (pp. 112–117). Australia: Toowoomba.

    Google Scholar 

  • Gourier, N., Hall, D., & Crowley, J. (2004). Estimating face orientation from robust detection of salient facial features. In International Workshop on Visual Observation of Deictic Gestures (ICPR). Cambridge.

  • Haralick, R. M., & Shapiro, L. G. (1992). Computer and robot vision. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Horprasert, T., Yacoob, Y., & Davis, L. S. (1996). Computing 3-d head orientation from a monocular image sequence. pp. 242–247.

  • Horvitz, E., Breese, J. S., & Henrion, M. (1988). Decision theory in expert systems and artificial intelligence. International Journal of Approximate Reasoning, 2(3), 247–302.

    Article  Google Scholar 

  • Jang, J. S. R. (1993). ANFIS adaptive-network-based Fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics, 23, 665–684.

    Article  Google Scholar 

  • Jensen, F. V. (1996). An introduction to Bayesian networks. New York: Springer.

    Google Scholar 

  • Jesorsky, O., Kirchberg, K., & Frischholz, R. (2001). Robust face detection using the Hausdorff distance. Lecture Notes in Computer Science (pp. 90–95).

  • Ji, Q., & Yang, X. (2002). Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. Real-Time Imaging, 8(5), 357–377.

    Article  MATH  MathSciNet  Google Scholar 

  • Kourkoutis, L., Panoulas, K., & Hadjileontiadis, L. (2007). Automated iris and Gaze detection using chrominance: Application to human-computer interaction using a low resolution webcam. In Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence. IEEE Computer Society (vol 01, pp. 536–539).

  • Kovac, J., Peer, P., & Solina, F. (2003). Human skin colour clustering for face detection. In IEEE International Conference on Computer as a Tool (vol 2).

  • LeCun, Y. (1989). Generalization and network design strategies. Conectionism in perspective.

  • LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., et al. (1990). Handwritten digit recognition with a back-propagation network. Advances in neural information processing systems (pp. 396–404). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. In Proceedings of the IEEE (vol 86, pp. 2278–2324).

  • LeCun, Y., Bottou, L., Orr, G., & Muller, K. (1998b). Neural networks: Tricks of the trade. In K. Muller (Ed.), Efficient backprop. New York: Springer.

    Google Scholar 

  • Lefevre, S., & Odobez, J. M. (2009). Structure and appearance features for robust 3d facial actions tracking. In International Conference on Multimedia Computing and Systems/International Conference on Multimedia and Expo (pp. 298–301).

  • Liu, F., Lin, X., Li, S. Z., & Shi, Y. (2003). Multi-modal face tracking using bayesian network. In Proceedings of IEEE International Workshop on Analysis and Modeling of Faces and Gestures.

  • Ma, B., Shan, S., Chen, X., & Gao, W. (2008). Head yaw estimation from asymmetry of facial appearance. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 38(6), 1501–1512.

    Article  Google Scholar 

  • Magee, J. J., Betke, M., Gips, J., Scott, M. R., & Waber, B. N. (2008). A human-computer interface using symmetry between eyes to detect gaze direction. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 38(6), 1–1261.

    Article  Google Scholar 

  • Messer, K., Kittler, J., Sadeghi, M., Marcel, S., Marcel, C., Bengio, S., Cardinaux, F., Czyz, J., Srisuk, S., Petrou, M., Kurutach, W., Kadyrov, E., Kepenekci, B., Tek, F. B., Akar, G. B., & Deravi, F. (2003). Face verification competition on the xm2vts database. In Proceedings of the International Conference on Audio and Video Based Biometric Person Authentication (pp. 964–974).

  • Morency, L. P., Rahimi, A., & Darrell, T. (2003). Adaptive view-based appearance model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 803–810).

  • Morency, L. P., Whitehill, J., & Movellan, J. R. (2010). Monocular head pose estimation using generalized adaptive view-based appearance model. Image Vision Computing, 28(5), 754–761.

    Article  Google Scholar 

  • Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 607–626.

    Article  Google Scholar 

  • Murphy-Chutorian, E., Doshi, A., & Trivedi, M. M. (2007). Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation. In Proceedings of the IEEE Conference on Intelligent Transportation Systems (pp. 709–714).

  • Nguyen, M. H., Modrego Pardo, P. J., & la Torre, F. D. (2008). Facial feature detection with optimal pixel reduction SVM. In Proceedings of the eighth IEEE International Conference on Automatic Face and Gesture Recognition (pp. 1–6).

  • Osadchy, M., LeCun, Y., & Miller, M. (2007). Synergistic face detection and pose estimation with energy-based models. Journal of Machine Learning Research, 8, 1197–1215.

    Google Scholar 

  • Peters, C., Asteriadis, S., & Karpouzis, K. (2009). Investigating shared attention with a virtual agent using a gaze-based interface. Journal on Multimodal User Interfaces, 3(1–2), 119–130. doi:101007/s12193-009-0029-1.

    Google Scholar 

  • Sarle, W. S. (1995). Stopped training and other remedies for overfitting. In Proceedings of the 27th Symposium on the Interface of Computing Science and Statistics (pp. 352–360).

  • Shaker, N., Asteriadis, S., Yannakakis, Y., & Karpouzis, K. (2011). A game-based corpus for analysing the interplay between game context and player experience. EmoGames workshop, International Conference on Affective Computing and Intelligent Interaction (ACII2011) (pp. 547–556), October 9, Memphis, TN.

  • Sim, T., Baker, S., & Bsat, M. (2003). The cmu pose, illumination, and expression database. IEEE Transactions Pattern Analysis Machine Intelligence, 25(12), 1615–1618.

    Article  Google Scholar 

  • Stiefelhagen, R. (2004). Estimating head pose with neural networks: Results on the pointing04 ICPR workshop evaluation data. In Pointing 04 Workshop (ICPR), Cambridge.

  • Sung, J., Kanade, T., & Kim, D. (2008). Pose robust face tracking by combining active appearance models and cylinder head models. International Journal of Computer Vision, 80(2), 260–274.

    Article  Google Scholar 

  • Takagi, T., & Sugeno, M. (1985). Fuzzy identification of systems and its applications to modelling and control. IEEE Transactions on Systems, Man, and Cybernetics, 15(1), 116–132.

    Article  MATH  Google Scholar 

  • Tan, K., Kriegman, D., & Ahuja, N. (2002). Appearance-based eye gaze estimation. In IEEE Workshop on Applications of Computer Vision (pp. 191–195).

  • Toyama, K., & Horvitz, E. (2000). Bayesian modality fusion: Probabilistic integration of multiple vision algorithms for head tracking. In Proceedings of 4th Asian Conference on Computer Vision (ACCV).

  • Valenti, R., Sebe, N., & Gevers, T. (2012). Combining head pose and eye location information for gaze estimation. IEEE Transactions on Image Processing, 21(2), 802–815.

    Article  MathSciNet  Google Scholar 

  • Viola, P. A., & Jones, M. J. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the Conference on Computer Vision and Pattern Recognition (vol 1, pp. 511–518).

  • Voit, M., & Stiefelhagen, R. (2010). 3D user-perspective, voxel-based estimation of visual focus of attention in dynamic meeting scenarios. In Proceedings of ICMI-MLMI.

  • Wang, J. G. (2003). Eye Gaze estimation from a single image of one eye. In IEEE International conference on Computer Vision (pp. 136–143).

  • Weidenbacher, U., Layher, G., Bayerl, P., & Neumann, H. (2006). Detection of head pose and Gaze direction for human-computer interaction. Perception and Interactive Technologies, 4021, 9–19.

    Article  Google Scholar 

  • Xiao, J., & Cohn, J. F. (2003). Robust full-motion recovery of head by dynamic templates and re-registration techniques. International Journal of Imaging Systems and Technology, 13, 85–94.

    Article  Google Scholar 

Download references

Acknowledgments

This research was supported by the FP7 ICT European project SIREN (project no: 258453). We would also like to thank all participants of the HPEG dataset for their helpful participation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stylianos Asteriadis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Asteriadis, S., Karpouzis, K. & Kollias, S. Visual Focus of Attention in Non-calibrated Environments using Gaze Estimation. Int J Comput Vis 107, 293–316 (2014). https://doi.org/10.1007/s11263-013-0691-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-013-0691-3

Keywords

Navigation