Abstract
This paper presents a novel face tracker and verifies its effectiveness for analyzing group meetings. In meeting scene analysis, face direction is an important clue for assessing the visual attention of meeting participants. The face tracker, called STCTracker (Sparse Template Condensation Tracker), estimates face position and pose by matching face templates in the framework of a particle filter. STCTracker is robust against large head rotation, up to ±60 degrees in the horizontal direction, with relatively small mean deviation error. Also, it can track multiple faces simultaneously in real-time by utilizing a modern GPU (Graphics Processing Unit), e.g. 6 faces at about 28 frames/second on a single PC. Also, it can automatically build 3-D face templates upon initialization of the tracker. This paper evaluates the tracking errors and verifies the effectiveness of STCTracker for meeting scene analysis, in terms of conversation structures, gaze directions, and the structure of cross-modal interactions involving head gestures and utterances. Experiments confirm that STCTracker can basically match the performance of from the user-unfriendly magnetic-sensor-based motion capture system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Argyle, M.: Bodily Communication, 2nd edn. Routledge, London, New York (1988)
Kendon, A.: Some functions of gaze-direction in social interaction. Acta Psychological 26, 22–63 (1967)
Stiefelhagen, R., Yang, J., Waibel, A.: Modeling focus of attention for meeting index based on multiple cues. IEEE Trans. Neural Networks 13(4) (2002)
Voit, M., Stiefelhagen, R.: Tracking head pose and focus of attention with multiple far-field cameras. In: Proc. ICMI 2006 (2006)
Morency, L.P., Rahimi, A., Checka, N., Darrell, T.: Fast stereo-based head tracking for interactive environment. In: Proc. IEEE FG 2002, pp. 375–380 (2002)
Gatica-Perez, D., Odobez, J. M., Ba, S., Smith, K., Lathoud, G.: Tracking people in meetings with particles. Technical Report IDIAP-RR 04-71, IDIAP (2004)
Ba, S.O., Odobez, J.M.: A probabilistic head pose tracking evaluation in single and multiple camera setups. In: Proc. CLEAR 2007 (2007)
Lozano, O.M., Otsuka, K.: Simultaneous and fast 3D tracking of multiple faces in video by GPU-based stream processing. In: Proc. IEEE ICASSP 2008, pp. 713–716 (2008)
Lozano, O.M., Otsuka, K.: Real-time visual tracker by stream processing –simultaneous and fast 3D tracking of multiple faces in video sequences by using a particle filter. Journal of VLSI Signal Processing Systems (accepted)
Otsuka, K., Takemae, Y., Yamato, J., Murase, H.: A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In: Proc. ACM ICMI 2005, pp. 191–198 (2005)
Otsuka, K., Sawada, H., Yamato, J.: Automatic inference of cross-modal nonverbal interactions in multiparty conversations. In: Proc. ACM ICMI 2007, pp. 255–262 (2007)
Matsubara, Y., Shakunaga, T.: Sparse template matching and its application to real-time object tracking. IPSJ Trans. Computer Vision and Image Media 46(SIG9), 60–71 (2005) (in Japanese)
Otsuka, K., Yamato, J., Murase, H.: Conversation scene analysis with dynamic Bayesian network based on visual head tracking. In: Proc. IEEE ICME 2006, pp. 949–952 (2006)
Viola, P., Jones, M.: Robust real-time face detection. Intl. Journal of Computer Vision 57(2), 137–154 (2004)
Edwards, G.J., Taylor, C.J., Cootes, T.F.: Interpreting face images using active appearance models. In: Proc. IEEE FG1998, pp. 300–305 (1998)
NVIDIA: NVIDIA CUDA (compute unified device architecture) programming guide ver.1.0 (2007), http://developer.nvidia.com/object/cuda.html
Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)
Kumano, S., Otsuka, K., Yamato, J., Maeda, E., Sato, Y.: Pose-invariant facial expression recognition using variable-intensity templates. In: Yagi, Y., Kang, S.B., Kweon, I.S., Zha, H. (eds.) ACCV 2007, Part I. LNCS, vol. 4843, pp. 230–239. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Otsuka, K., Yamato, J. (2008). Fast and Robust Face Tracking for Analyzing Multiparty Face-to-Face Meetings. In: Popescu-Belis, A., Stiefelhagen, R. (eds) Machine Learning for Multimodal Interaction. MLMI 2008. Lecture Notes in Computer Science, vol 5237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85853-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-85853-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85852-2
Online ISBN: 978-3-540-85853-9
eBook Packages: Computer ScienceComputer Science (R0)