ABSTRACT
To build a conversational interface wherein an agent system can smoothly communicate with multiple persons, it is imperative to know how the timing of speaking is decided. In this research, we explore the head movements of participants as an easy-to-measure nonverbal behavior to predict the nest-utterance timing, i.e., the interval between the end of the current speaker's utterance and the start of the next speaker's utterance, in turn-changing in multi-party meetings. First, we collected data on participants' six degree-of-freedom head movements and utterances in four-person meetings. The results of the analysis revealed that the amount of head movements of current speaker, next speaker, and listeners have a positive correlation with the utterance interval. Moreover, the degree of synchrony of the head position and posture between the current speaker and next speaker is negatively correlated with the utterance interval. On the basis of these findings, we used their head movements and the synchrony of their head movements as feature values and devised several prediction models. A model using all features performed the best and was able to predict the next-utterance timing well. Therefore, this research revealed that the participants' head movement is useful for predicting the next-utterance timing in turn-changing in multi-party meetings.
- Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2010.showarticletitleWEKA--Experiences with a Java Open-Source Project. The Journal of Machine Learning Research 11 (2010), 2533--2541.Google ScholarDigital Library
- Lei Chen and Mary P. Harper. 2009.showarticletitleMultimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interaction. 15--22.Google Scholar
- Iwan de Kok and Dirk Heylen. 2009.showarticletitleMultimodal end-of-turn prediction in multi-party meetings. In Proceedings of the International Conference on Multimodal Interaction. 91--98.Google Scholar
- E. Delaherche, M. Chetouani, A. Mahdhaoui, C. Saint-Georges, S. Viaux, and D. Cohen. 2012.showarticletitleInterpersonal Synchrony:A Survey of Evaluation Methods across Disciplines. IEEE Transactions on Affective Computing 3, 3 (July 2012), 349--365.\showISSN1949--3045Google ScholarDigital Library
- Alfred Dielmann, Giulia Garau, and Herv? Bourlard. 2010.showarticletitleFloor holder detection and end of speaker turn prediction in meetings. In Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309.Google Scholar
- S. Duncan and D. W. Fiske. 1977.showarticletitleFace-to-face interaction:research. Methods and theory, Hillsdale, New Jersy:lawrence Erlbaum (1977).Google Scholar
- Daniel Gatica-Perez. 2009.showarticletitleAutomatic Nonverbal Analysis of Social Interaction in Small Groups:a Review. Image and Vision Computing, Special Issue on Human Behavior 27, 12 (Nov 2009), 1775--1787.Google Scholar
- Masayuki Inoue, Isamu Yoroizawa, and Sakae Okubo. 1984.showarticletitleHuman Factors Oriented Design Objectives for Video Teleconferencing Systems. In ITS. 66--73.Google Scholar
- Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2015a.showarticletitleMultimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. 99--106.Google Scholar
- Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2015b.showarticletitlePredicting Next Speaker Using Head Movement in Multi-party Meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 2319--2323.Google Scholar
- Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2016a.showarticletitleAnalyzing mouth-opening transition pattern for predicting next speaker in multi-party meetings. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. 209--216.Google Scholar
- Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, Masafumi Matsuda, and Junji Yamato. 2013.showarticletitlePredicting Next Speaker and Timing from Gaze Transition Patterns in Multi-Party Meetings. In Proceedings of the International Conference on Multimodal Interaction. 79--86.Google Scholar
- Ryo Ishii, Kauhiro Otsuka, Shiro Kumano, and Junji Yamamoto. 2016b.showarticletitlePredicting of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings. The ACM Transactions on Interactive Intelligent Systems 6, 1 (2016), 4.Google Scholar
- Ryo Ishii, Kauhiro Otsuka, Shiro Kumano, and Junji Yamamoto. 2016c.showarticletitleUsing respiration to predict who will speak next and when in multiparty meetings. The ACM Transactions on Interactive Intelligent Systems 6, 2 (2016), 20.Google Scholar
- Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, and Junji Yamato. 2014a.showarticletitleAnalysis and Modeling of Next Speaking Start Timing based on Gaze Behavior in Multi-party Meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 694--698.Google ScholarCross Ref
- Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, and Junji Yamato. 2014b.showarticletitleAnalysis of Respiration for Prediction of Who Will Be Next Speaker and When? in Multi-Party Meetings. In Proceedings of the International Conference on Multimodal Interaction. 18--25.Google Scholar
- Toshihiko Itoh, Norihide Kitaoka, and Ryota Nishimura. 2009.showarticletitleSubjective experiments on influence of response timing in spoken dialogues. In Interspeech. 1835--1838.Google Scholar
- K Jokinen, K Harada, M Nishida, and S Yamamoto. 2011.showarticletitleTurn-alignment using eye-gaze and speech in conversational interaction. In ISCA. 2018--2021.Google Scholar
- Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt. 2006.showarticletitleAddressee identification in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics.Google Scholar
- S. Sathiya Keerthi, Shirish Shevade, Chiranjib Bhattacharyya, and K.R. Krishna Murthy. 2001.showarticletitleImprovements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13, 3 (2001), 637--649. Google ScholarDigital Library
- Adam Kendon. 1967.showarticletitleSome functions of gaze direction in social interaction. ActaPsychologica 26 (1967), 22--63. Google ScholarCross Ref
- H Koiso, Y Horiuchi, S Tutiya, A Ichikawa, and Y Den. 1998.showarticletitleAn analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese Map Task dialogs. In Language and Speech, Vol. 41. 295--321. Google ScholarCross Ref
- Kornel Laskowski, Jens Edlund, and Mattias Heldner. 2011.showarticletitleA single-port non-parametric model of turn-taking in multi-party conversation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 5600--5603.Google ScholarCross Ref
- Hendrikus J.A. op den Akker, Daniel Gatica-Perez, and Dirk K.J. Heylen. 2012.showarticletitleMulti-modal analysis of small-group conversational dynamics. In Multimodal Signal Processing, J. Carletta & A. Popescu-Belis (Eds.) In S. Renals, H. Bourlard (Ed.). New York:Cambridge University Press, 155--169.Google Scholar
- Kazuhiro Otsuka. 2011.showarticletitleConversational scene analysis. IEEE Signal Processing Magazine 28 (2011), 127--131. Google ScholarCross Ref
- POLHEMUS. 2017. Fastrak. (2017).\shownotehttp:\slash\slashpolhemus.com\slashmotion-tracking\slashall-trackers\slashfastrak\slash.Google Scholar
- Rutger Rienks, Ponald Poppe, and Dirk Heylen. 2010.showarticletitleDifferences in head orientation behavior for speakers and listeners:An experiment in a virtual environment. J. TAP 7, 1(2) (2010).Google Scholar
- Duncan S. and G. Niederehe. 1974.showarticletitleOn signalling that it’s your turn to speak. J. Experimental Social Psychology 10 (1974), 234--247. Google ScholarCross Ref
- Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. 1974.showarticletitleA simplest systematics for the organisation of turn taking for conversation. Language 50 (1974), 696--735. Google ScholarCross Ref
- Senko Maynard. 1987.showarticletitleInteractional functions of a nonverbal sign:Head movement in Japanese dyadic casual conversation. Journal of Pragmatics 11 (1987), 589--606. Google ScholarCross Ref
- Senko Maynard. 1989.showarticletitleJapanese conversation:Self-contextualization through structure and interactional management. Norwood, New Jersey:Ablex Publishing Corporation (1989).Google Scholar
Index Terms
- Prediction of Next-Utterance Timing using Head Movement in Multi-Party Meetings
Recommendations
Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal InteractionTechniques that use nonverbal behaviors to predict turn-taking situations, such as who will be the next speaker and the next utterance timing in multi-party meetings are receiving a lot of attention recently. It has long been known that gaze is a ...
Predicting next speaker and timing from gaze transition patterns in multi-party meetings
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interactionIn multi-party meetings, participants need to predict the end of the speaker's utterance and who will start speaking next, and to consider a strategy for good timing to speak next. Gaze behavior plays an important role for smooth turn-taking. This paper ...
Analyzing mouth-opening transition pattern for predicting next speaker in multi-party meetings
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal InteractionTechniques that use nonverbal behaviors to predict turn-changing situations—e.g., predicting who will speak next and when, in multi-party meetings—have been receiving a lot of attention in recent research. In this research, we explored the transition ...
Comments