Unsupervised Clustering in Multimodal Multiparty Meeting Analysis

Matsusaka, Yosuke; Katagiri, Yasuhiro; Ishizaki, Masato; Enomoto, Mika

doi:10.1007/978-3-642-04793-0_6

Yosuke Matsusaka²³,
Yasuhiro Katagiri²⁴,
Masato Ishizaki²⁵ &
…
Mika Enomoto²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5509))

Included in the following conference series:

International LREC Workshop on Multimodal Corpora

1136 Accesses

Abstract

Nonverbal signals such as gazes, head nods, facial expressions, and bodily gestures play significant roles in organizing human interactions. Their significance is even more emphasized in multiparty settings, since many interaction organization behaviors, for example, turn-taking and participation role assignment, are realized nonverbally. Several projects have been involved in collecting multimodal corpora [3,4] for multiparty dialogues, to develop techniques for meeting event recognitions from nonverbal as well as verbal signals (e.g., [11,2]).

The task of annotating nonverbal signals exchanged in conversational interactions poses both theoretical and practical challenges for the development of multimodal corpora. Many projects rely on both manual annotation and automatic signal processing in corpus building. Some projects apply different methods to different types of signals to facilitate the efficient construction of corpora through the division of labor [9]. Others treat manual annotations as ideal values in the process of validating their signal processing methods [7].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Asano, F., Ogata, J.: Detection and separation of speech events in meeting recordings. In: Proc. Interspeech, pp. 2586–2589 (2006)
Google Scholar
Ba, S., Odobez, J.-M.: A study on visual focus of attention recognition from head pose in a meeting room. In: Renals, S., Bengio, S., Fiscus, J.G. (eds.) MLMI 2006. LNCS, vol. 4299, pp. 75–87. Springer, Heidelberg (2006)
Chapter Google Scholar
Carletta, J., Ashby, S., Bourban, S., Flynn, M., Guillemot, M., Hain, T., Kadlec, J., Karaiskos, V., Kraaij, W., Kronenthal, M., Lathoud, G., Lincoln, M., Lisowska, A., McCowan, I., Post, W., Reidsma, D., Wellner, P.: The ami meeting corpus: A pre-announcement. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 28–39. Springer, Heidelberg (2006)
Chapter Google Scholar
Chen, L., Travis Rose, R., Qiao, Y., Kimbara, I., Parrill, F., Welji, H., Han, T.X., Tu, J., Huang, Z., Harper, M., Quek, F., Xiong, Y., McNeill, D., Tuttle, R., Huang, T.: Vace multimodal meeting corpus. In: Renals, S., Bengio, S. (eds.) MLMI 2005. LNCS, vol. 3869, pp. 40–51. Springer, Heidelberg (2006)
Chapter Google Scholar
Kipp, M.: Gesture Generation by Imitation: From Human Behavior to Computer Character Animation. Dissertation.com, Boca Raton, FL (2004)
Google Scholar
Lee, S., Hayes, M.H.: An application for interactive video abstraction. In: IEEE International Conference on Acoustics Speech and Signal Processing, vol. 5, pp. 17–21 (2004)
Google Scholar
Martin, J.-C., Caridakis, G., Devillers, L., Karpouzis, K., Abrilian, S.: Manual annotation and automatic image processing of multimodal emotional behaviours: Validating the annotation of TV interviews. In: Proc. LREC 2006, pp. 1127–1132 (2006)
Google Scholar
Matsusaka, Y.: Recognition of 3 party conversation using prosody and gaze. In: Proc. Interspeech, pp. 1205–1208 (2005)
Google Scholar
Pianesi, F., Zancanaro, M., Leonardi, C.: Multimodal annotated corpora of consensus decision making meetings. In: LREC 2006 Workshop on Multimodal Corpora, pp. 6–9 (2006)
Google Scholar
Sas, C., O’ Hare, G., Reilly, R.: Virtual environment trajectory analysis: a basis for navigational assistance and scene adaptivity. Future Generation Computer Systems 21, 1157–1166 (2005)
Article Google Scholar
Stiefelhagen, R., Yang, J., Waibel, A.: Modeling focus of attention for meeting indexing based on multiple cues. IEEE Transactions on Neural Networks 13(4), 923–938 (2002)
Article Google Scholar
Turaga, P.K., Veeraraghavan, A., Chellappa, R.: From videos to verbs: Miningvideos for events using a cascade of dynamical systems. In: Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (2007)
Google Scholar
Viola, P., Jones, M.J.: Robust real-time face detection. International Journal of Computer Vision 57(2), 137–154 (2004)
Article Google Scholar
Wang, T., Shum, H., Xu, Y., Zheng, N.: Unsupervised analysis of human gestures. In: Shum, H.-Y., Liao, M., Chang, S.-F. (eds.) PCM 2001. LNCS, vol. 2195, pp. 174–181. Springer, Heidelberg (2001)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Advanced Industrial Science and Technology, 1-1-1 Umezono, Tsukuba, Ibaraki, Japan
Yosuke Matsusaka
Future University Hakodate, 116-2 Kamedanakano, Hakodate, Hokkaido, Japan
Yasuhiro Katagiri
The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan
Masato Ishizaki
Tokyo University of Technology, 1404 Katakura, Hachioji, Tokyo, Japan
Mika Enomoto

Authors

Yosuke Matsusaka
View author publications
You can also search for this author in PubMed Google Scholar
Yasuhiro Katagiri
View author publications
You can also search for this author in PubMed Google Scholar
Masato Ishizaki
View author publications
You can also search for this author in PubMed Google Scholar
Mika Enomoto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Deutsches Forschungszentrum für künstliche Intelligenz (DFKI), Campus D3.2, 66123, Saarbrücken, Germany
Michael Kipp
Laboratoire d’Informatique pour la Mécanique et les Sciences de l’Ingénieur (LIMSI-CNRS), BP 133, 91403, Orsay Cedex, France
Jean-Claude Martin
Faculty of Humanities, Centre for Language Technology, University of Copenhagen, Njalsgade 140-142, 2300, Copenhagen, Denmark
Patrizia Paggio
Computer Science, Human Media Interaction, University of Twente, PO Box 217, 7500, Enschede, AE, The Netherlands
Dirk Heylen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Matsusaka, Y., Katagiri, Y., Ishizaki, M., Enomoto, M. (2009). Unsupervised Clustering in Multimodal Multiparty Meeting Analysis. In: Kipp, M., Martin, JC., Paggio, P., Heylen, D. (eds) Multimodal Corpora. MMCorp 2008. Lecture Notes in Computer Science(), vol 5509. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04793-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-04793-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04792-3
Online ISBN: 978-3-642-04793-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics