research-article

Prediction of Next-Utterance Timing using Head Movement in Multi-Party Meetings

Authors:
Ryo Ishii

NTT Corporation, Yokosuka, Kanagawa, Japan

NTT Corporation, Yokosuka, Kanagawa, Japan
View Profile

,
Shiro Kumano

NTT Corporation, Atsugi-shi, Kanagawa, Japan

NTT Corporation, Atsugi-shi, Kanagawa, Japan
View Profile

,
Kazuhiro Otsuka

NTT Corporation, Atsugi-shi, Kanagawa, Japan

NTT Corporation, Atsugi-shi, Kanagawa, Japan
View Profile

HAI '17: Proceedings of the 5th International Conference on Human Agent InteractionOctober 2017Pages 181–187https://doi.org/10.1145/3125739.3125765

Published:27 October 2017Publication History

HAI '17: Proceedings of the 5th International Conference on Human Agent Interaction

Pages 181–187

ABSTRACT

To build a conversational interface wherein an agent system can smoothly communicate with multiple persons, it is imperative to know how the timing of speaking is decided. In this research, we explore the head movements of participants as an easy-to-measure nonverbal behavior to predict the nest-utterance timing, i.e., the interval between the end of the current speaker's utterance and the start of the next speaker's utterance, in turn-changing in multi-party meetings. First, we collected data on participants' six degree-of-freedom head movements and utterances in four-person meetings. The results of the analysis revealed that the amount of head movements of current speaker, next speaker, and listeners have a positive correlation with the utterance interval. Moreover, the degree of synchrony of the head position and posture between the current speaker and next speaker is negatively correlated with the utterance interval. On the basis of these findings, we used their head movements and the synchrony of their head movements as feature values and devised several prediction models. A model using all features performed the best and was able to predict the next-utterance timing well. Therefore, this research revealed that the participants' head movement is useful for predicting the next-utterance timing in turn-changing in multi-party meetings.

References

Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2010.showarticletitleWEKA--Experiences with a Java Open-Source Project. The Journal of Machine Learning Research 11 (2010), 2533--2541.Google ScholarDigital Library
Lei Chen and Mary P. Harper. 2009.showarticletitleMultimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interaction. 15--22.Google Scholar
Iwan de Kok and Dirk Heylen. 2009.showarticletitleMultimodal end-of-turn prediction in multi-party meetings. In Proceedings of the International Conference on Multimodal Interaction. 91--98.Google Scholar
E. Delaherche, M. Chetouani, A. Mahdhaoui, C. Saint-Georges, S. Viaux, and D. Cohen. 2012.showarticletitleInterpersonal Synchrony:A Survey of Evaluation Methods across Disciplines. IEEE Transactions on Affective Computing 3, 3 (July 2012), 349--365.\showISSN1949--3045Google ScholarDigital Library
Alfred Dielmann, Giulia Garau, and Herv? Bourlard. 2010.showarticletitleFloor holder detection and end of speaker turn prediction in meetings. In Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309.Google Scholar
S. Duncan and D. W. Fiske. 1977.showarticletitleFace-to-face interaction:research. Methods and theory, Hillsdale, New Jersy:lawrence Erlbaum (1977).Google Scholar
Daniel Gatica-Perez. 2009.showarticletitleAutomatic Nonverbal Analysis of Social Interaction in Small Groups:a Review. Image and Vision Computing, Special Issue on Human Behavior 27, 12 (Nov 2009), 1775--1787.Google Scholar
Masayuki Inoue, Isamu Yoroizawa, and Sakae Okubo. 1984.showarticletitleHuman Factors Oriented Design Objectives for Video Teleconferencing Systems. In ITS. 66--73.Google Scholar
Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2015a.showarticletitleMultimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. 99--106.Google Scholar
Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2015b.showarticletitlePredicting Next Speaker Using Head Movement in Multi-party Meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 2319--2323.Google Scholar
Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2016a.showarticletitleAnalyzing mouth-opening transition pattern for predicting next speaker in multi-party meetings. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. 209--216.Google Scholar
Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, Masafumi Matsuda, and Junji Yamato. 2013.showarticletitlePredicting Next Speaker and Timing from Gaze Transition Patterns in Multi-Party Meetings. In Proceedings of the International Conference on Multimodal Interaction. 79--86.Google Scholar
Ryo Ishii, Kauhiro Otsuka, Shiro Kumano, and Junji Yamamoto. 2016b.showarticletitlePredicting of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings. The ACM Transactions on Interactive Intelligent Systems 6, 1 (2016), 4.Google Scholar
Ryo Ishii, Kauhiro Otsuka, Shiro Kumano, and Junji Yamamoto. 2016c.showarticletitleUsing respiration to predict who will speak next and when in multiparty meetings. The ACM Transactions on Interactive Intelligent Systems 6, 2 (2016), 20.Google Scholar
Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, and Junji Yamato. 2014a.showarticletitleAnalysis and Modeling of Next Speaking Start Timing based on Gaze Behavior in Multi-party Meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 694--698.Google ScholarCross Ref
Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, and Junji Yamato. 2014b.showarticletitleAnalysis of Respiration for Prediction of Who Will Be Next Speaker and When? in Multi-Party Meetings. In Proceedings of the International Conference on Multimodal Interaction. 18--25.Google Scholar
Toshihiko Itoh, Norihide Kitaoka, and Ryota Nishimura. 2009.showarticletitleSubjective experiments on influence of response timing in spoken dialogues. In Interspeech. 1835--1838.Google Scholar
K Jokinen, K Harada, M Nishida, and S Yamamoto. 2011.showarticletitleTurn-alignment using eye-gaze and speech in conversational interaction. In ISCA. 2018--2021.Google Scholar
Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt. 2006.showarticletitleAddressee identification in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics.Google Scholar
S. Sathiya Keerthi, Shirish Shevade, Chiranjib Bhattacharyya, and K.R. Krishna Murthy. 2001.showarticletitleImprovements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13, 3 (2001), 637--649. Google ScholarDigital Library
Adam Kendon. 1967.showarticletitleSome functions of gaze direction in social interaction. ActaPsychologica 26 (1967), 22--63. Google ScholarCross Ref
H Koiso, Y Horiuchi, S Tutiya, A Ichikawa, and Y Den. 1998.showarticletitleAn analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese Map Task dialogs. In Language and Speech, Vol. 41. 295--321. Google ScholarCross Ref
Kornel Laskowski, Jens Edlund, and Mattias Heldner. 2011.showarticletitleA single-port non-parametric model of turn-taking in multi-party conversation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 5600--5603.Google ScholarCross Ref
Hendrikus J.A. op den Akker, Daniel Gatica-Perez, and Dirk K.J. Heylen. 2012.showarticletitleMulti-modal analysis of small-group conversational dynamics. In Multimodal Signal Processing, J. Carletta & A. Popescu-Belis (Eds.) In S. Renals, H. Bourlard (Ed.). New York:Cambridge University Press, 155--169.Google Scholar
Kazuhiro Otsuka. 2011.showarticletitleConversational scene analysis. IEEE Signal Processing Magazine 28 (2011), 127--131. Google ScholarCross Ref
POLHEMUS. 2017. Fastrak. (2017).\shownotehttp:\slash\slashpolhemus.com\slashmotion-tracking\slashall-trackers\slashfastrak\slash.Google Scholar
Rutger Rienks, Ponald Poppe, and Dirk Heylen. 2010.showarticletitleDifferences in head orientation behavior for speakers and listeners:An experiment in a virtual environment. J. TAP 7, 1(2) (2010).Google Scholar
Duncan S. and G. Niederehe. 1974.showarticletitleOn signalling that it’s your turn to speak. J. Experimental Social Psychology 10 (1974), 234--247. Google ScholarCross Ref
Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. 1974.showarticletitleA simplest systematics for the organisation of turn taking for conversation. Language 50 (1974), 696--735. Google ScholarCross Ref
Senko Maynard. 1987.showarticletitleInteractional functions of a nonverbal sign:Head movement in Japanese dyadic casual conversation. Journal of Pragmatics 11 (1987), 589--606. Google ScholarCross Ref
Senko Maynard. 1989.showarticletitleJapanese conversation:Self-contextualization through structure and interactional management. Norwood, New Jersey:Ablex Publishing Corporation (1989).Google Scholar

Index Terms

Prediction of Next-Utterance Timing using Head Movement in Multi-Party Meetings
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal Interaction

Techniques that use nonverbal behaviors to predict turn-taking situations, such as who will be the next speaker and the next utterance timing in multi-party meetings are receiving a lot of attention recently. It has long been known that gaze is a ...
Read More
Predicting next speaker and timing from gaze transition patterns in multi-party meetings
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

In multi-party meetings, participants need to predict the end of the speaker's utterance and who will start speaking next, and to consider a strategy for good timing to speak next. Gaze behavior plays an important role for smooth turn-taking. This paper ...
Read More
Analyzing mouth-opening transition pattern for predicting next speaker in multi-party meetings
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

Techniques that use nonverbal behaviors to predict turn-changing situations—e.g., predicting who will speak next and when, in multi-party meetings—have been receiving a lot of attention in recent research. In this research, we explored the transition ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HAI '17: Proceedings of the 5th International Conference on Human Agent Interaction
October 2017
550 pages
ISBN:9781450351133
DOI:10.1145/3125739
General Chairs:
Britta Wrede
Bielefeld University, Germany
,
Yukie Nagai
Osaka University, Japan
,
Program Chairs:
Takanori Komatsu
Meiji University, Japan
,
Marc Hanheide
University of Lincoln, UK
,
Lorenzo Natale
Italian Institute of Technology, Italy
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
head movement
multi-party meetings
synchrony
turn-taking
utterance interval
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate121of404submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 180
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Prediction of Next-Utterance Timing using Head Movement in Multi-Party Meetings

HAI '17: Proceedings of the 5th International Conference on Human Agent Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings

Predicting next speaker and timing from gaze transition patterns in multi-party meetings

Analyzing mouth-opening transition pattern for predicting next speaker in multi-party meetings