Abstract
This presentation overviews our recent progress in multimodal conversation scene analysis, and discusses its future in terms of designing better human-to-human communication systems. Conversation scene analysis aims to provide the automatic description of conversation scenes from the multimodal nonverbal behaviors of participants as captured by cameras and microphones. So far, the author’s group has proposed a research framework based on the probabilistic modeling of conversation phenomena for solving several basic problems including speaker diarization, i.e. “who is speaking when”, addressee identification, i.e. “who is talking to whom”, interaction structure, i.e. “who is responding to whom”, the estimation of visual focus of attention (VFOA), i.e. “who is looking at whom”, and the inference of interpersonal emotion such as “who has empathy/antipathy with whom”, from observed multimodal behaviors including utterances, head pose, head gestures, eye-gaze, and facial expressions. This paper overviews our approach and discusses how conversation scene analysis can be extended to enhance the design process of computer-mediated communication systems.
Chapter PDF
Similar content being viewed by others
References
Gatica-Perez, D.: Automatic Nonverbal Analysis of Social Interaction in Small Groups: A Review. Image and Vision Computing 27, 1775–1787 (2009)
Argyle, M.: Bodily Communication, 2nd edn. Routledge, London and New York (1988)
Otsuka, K., Araki, S., Ishizuka, K., Fujimoto, F., Heinrich, M., Yamato, J.: A Realtime Multimodal System for Analyzing Group Meetings by Combining Face Pose Tracking and Speaker Diarization. In: ACM ICMI 2008, pp. 257–264 (2008)
Ishizuka, K., Araki, S., Otsuka, K., Nakatani, T., Fujimoto, M.: A Speaker Diarization Method based on the Probabilistic Fusion of Audio-visual Location Information. In: ICMI 2009, pp. 55–62 (2009)
Otsuka, K., Takemae, Y., Yamato, J., Murase, H.: A Probabilistic Inference of Multiparty-Conversation Structure based on Markov-Switching Models of Gaze Patterns, Head Directions, and Utterances. In: ACM ICMI 2005, pp. 191–198 (2005)
Otsuka, K., Yamato, J., Murase, H.: Conversation Scene Analysis with Dynamic Bayesian Network based on Visual Head Tracking. In: ICME 2006, pp. 949–952 (2006)
Gorga, S., Otsuka, K.: Conversation Scene Analysis based on Dynamic Bayesian Network and Image-based Gaze Detection. In: ACM ICMI-MLMI 2010 (2010)
Otsuka, K., Sawada, H., Yamato, J.: Automatic Inference of Cross-modal Nonverbal Interactions in Multiparty Conversations. In: ACM ICMI 2007, pp. 255–262 (2007)
Otsuka, K., Yamato, J.: Fast and Robust Face Tracking for Analyzing Multiparty Face-to-Face Meetings. In: MLM 2008 (2008)
Kumano, S., Otsuka, K., Mikami, D., Yamato, J.: Analyzing Empathetic Interactions based on the Probabilistic Modeling of the Co-occurrence Patterns of Facial Expressions in Group Meetings. In: 9th IEEE Conference on Automatic Face and Gesture Recognition, FG 2011 (2011)
Kendon, A.: Some Functions of Gaze-direction in Social Interaction. Acta Psychologica 26, 22–63 (1967)
Goodwin, C.: Conversational Organization: Interaction Between Speakers and Hearers. Academic Press, London (1981)
Maynard, S.K.: Interactional Functions of a Nonverbal Sign: Head Movement in Japanese Dyadic Casual Conversation. J. Pragmatics 11, 589–606 (1987)
Mateo Lozano, O., Otsuka, K.: Real-time Visual Tracker by Stream Processing. Journal of Signal Processing Systems, 285–295 (2008)
Mikami, D., Otsuka, K., Yamato, J.: Memory-based Particle Filter for Face Pose Tracking Robust under Complex Dynamics. In: IEEE CVPR 2009, pp. 999–1006 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Otsuka, K. (2011). Multimodal Conversation Scene Analysis for Understanding People’s Communicative Behaviors in Face-to-Face Meetings. In: Salvendy, G., Smith, M.J. (eds) Human Interface and the Management of Information. Interacting with Information. Human Interface 2011. Lecture Notes in Computer Science, vol 6772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21669-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-21669-5_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21668-8
Online ISBN: 978-3-642-21669-5
eBook Packages: Computer ScienceComputer Science (R0)