skip to main content
10.1145/3125739.3125765acmconferencesArticle/Chapter ViewAbstractPublication PageshaiConference Proceedingsconference-collections
research-article

Prediction of Next-Utterance Timing using Head Movement in Multi-Party Meetings

Published:27 October 2017Publication History

ABSTRACT

To build a conversational interface wherein an agent system can smoothly communicate with multiple persons, it is imperative to know how the timing of speaking is decided. In this research, we explore the head movements of participants as an easy-to-measure nonverbal behavior to predict the nest-utterance timing, i.e., the interval between the end of the current speaker's utterance and the start of the next speaker's utterance, in turn-changing in multi-party meetings. First, we collected data on participants' six degree-of-freedom head movements and utterances in four-person meetings. The results of the analysis revealed that the amount of head movements of current speaker, next speaker, and listeners have a positive correlation with the utterance interval. Moreover, the degree of synchrony of the head position and posture between the current speaker and next speaker is negatively correlated with the utterance interval. On the basis of these findings, we used their head movements and the synchrony of their head movements as feature values and devised several prediction models. A model using all features performed the best and was able to predict the next-utterance timing well. Therefore, this research revealed that the participants' head movement is useful for predicting the next-utterance timing in turn-changing in multi-party meetings.

References

  1. Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2010.showarticletitleWEKA--Experiences with a Java Open-Source Project. The Journal of Machine Learning Research 11 (2010), 2533--2541.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lei Chen and Mary P. Harper. 2009.showarticletitleMultimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interaction. 15--22.Google ScholarGoogle Scholar
  3. Iwan de Kok and Dirk Heylen. 2009.showarticletitleMultimodal end-of-turn prediction in multi-party meetings. In Proceedings of the International Conference on Multimodal Interaction. 91--98.Google ScholarGoogle Scholar
  4. E. Delaherche, M. Chetouani, A. Mahdhaoui, C. Saint-Georges, S. Viaux, and D. Cohen. 2012.showarticletitleInterpersonal Synchrony:A Survey of Evaluation Methods across Disciplines. IEEE Transactions on Affective Computing 3, 3 (July 2012), 349--365.\showISSN1949--3045Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alfred Dielmann, Giulia Garau, and Herv? Bourlard. 2010.showarticletitleFloor holder detection and end of speaker turn prediction in meetings. In Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309.Google ScholarGoogle Scholar
  6. S. Duncan and D. W. Fiske. 1977.showarticletitleFace-to-face interaction:research. Methods and theory, Hillsdale, New Jersy:lawrence Erlbaum (1977).Google ScholarGoogle Scholar
  7. Daniel Gatica-Perez. 2009.showarticletitleAutomatic Nonverbal Analysis of Social Interaction in Small Groups:a Review. Image and Vision Computing, Special Issue on Human Behavior 27, 12 (Nov 2009), 1775--1787.Google ScholarGoogle Scholar
  8. Masayuki Inoue, Isamu Yoroizawa, and Sakae Okubo. 1984.showarticletitleHuman Factors Oriented Design Objectives for Video Teleconferencing Systems. In ITS. 66--73.Google ScholarGoogle Scholar
  9. Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2015a.showarticletitleMultimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. 99--106.Google ScholarGoogle Scholar
  10. Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2015b.showarticletitlePredicting Next Speaker Using Head Movement in Multi-party Meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 2319--2323.Google ScholarGoogle Scholar
  11. Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2016a.showarticletitleAnalyzing mouth-opening transition pattern for predicting next speaker in multi-party meetings. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing. 209--216.Google ScholarGoogle Scholar
  12. Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, Masafumi Matsuda, and Junji Yamato. 2013.showarticletitlePredicting Next Speaker and Timing from Gaze Transition Patterns in Multi-Party Meetings. In Proceedings of the International Conference on Multimodal Interaction. 79--86.Google ScholarGoogle Scholar
  13. Ryo Ishii, Kauhiro Otsuka, Shiro Kumano, and Junji Yamamoto. 2016b.showarticletitlePredicting of Who Will Be the Next Speaker and When Using Gaze Behavior in Multiparty Meetings. The ACM Transactions on Interactive Intelligent Systems 6, 1 (2016), 4.Google ScholarGoogle Scholar
  14. Ryo Ishii, Kauhiro Otsuka, Shiro Kumano, and Junji Yamamoto. 2016c.showarticletitleUsing respiration to predict who will speak next and when in multiparty meetings. The ACM Transactions on Interactive Intelligent Systems 6, 2 (2016), 20.Google ScholarGoogle Scholar
  15. Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, and Junji Yamato. 2014a.showarticletitleAnalysis and Modeling of Next Speaking Start Timing based on Gaze Behavior in Multi-party Meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 694--698.Google ScholarGoogle ScholarCross RefCross Ref
  16. Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, and Junji Yamato. 2014b.showarticletitleAnalysis of Respiration for Prediction of Who Will Be Next Speaker and When? in Multi-Party Meetings. In Proceedings of the International Conference on Multimodal Interaction. 18--25.Google ScholarGoogle Scholar
  17. Toshihiko Itoh, Norihide Kitaoka, and Ryota Nishimura. 2009.showarticletitleSubjective experiments on influence of response timing in spoken dialogues. In Interspeech. 1835--1838.Google ScholarGoogle Scholar
  18. K Jokinen, K Harada, M Nishida, and S Yamamoto. 2011.showarticletitleTurn-alignment using eye-gaze and speech in conversational interaction. In ISCA. 2018--2021.Google ScholarGoogle Scholar
  19. Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt. 2006.showarticletitleAddressee identification in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics.Google ScholarGoogle Scholar
  20. S. Sathiya Keerthi, Shirish Shevade, Chiranjib Bhattacharyya, and K.R. Krishna Murthy. 2001.showarticletitleImprovements to Platt’s SMO Algorithm for SVM Classifier Design. Neural Computation 13, 3 (2001), 637--649. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Adam Kendon. 1967.showarticletitleSome functions of gaze direction in social interaction. ActaPsychologica 26 (1967), 22--63. Google ScholarGoogle ScholarCross RefCross Ref
  22. H Koiso, Y Horiuchi, S Tutiya, A Ichikawa, and Y Den. 1998.showarticletitleAn analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese Map Task dialogs. In Language and Speech, Vol. 41. 295--321. Google ScholarGoogle ScholarCross RefCross Ref
  23. Kornel Laskowski, Jens Edlund, and Mattias Heldner. 2011.showarticletitleA single-port non-parametric model of turn-taking in multi-party conversation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 5600--5603.Google ScholarGoogle ScholarCross RefCross Ref
  24. Hendrikus J.A. op den Akker, Daniel Gatica-Perez, and Dirk K.J. Heylen. 2012.showarticletitleMulti-modal analysis of small-group conversational dynamics. In Multimodal Signal Processing, J. Carletta & A. Popescu-Belis (Eds.) In S. Renals, H. Bourlard (Ed.). New York:Cambridge University Press, 155--169.Google ScholarGoogle Scholar
  25. Kazuhiro Otsuka. 2011.showarticletitleConversational scene analysis. IEEE Signal Processing Magazine 28 (2011), 127--131. Google ScholarGoogle ScholarCross RefCross Ref
  26. POLHEMUS. 2017. Fastrak. (2017).\shownotehttp:\slash\slashpolhemus.com\slashmotion-tracking\slashall-trackers\slashfastrak\slash.Google ScholarGoogle Scholar
  27. Rutger Rienks, Ponald Poppe, and Dirk Heylen. 2010.showarticletitleDifferences in head orientation behavior for speakers and listeners:An experiment in a virtual environment. J. TAP 7, 1(2) (2010).Google ScholarGoogle Scholar
  28. Duncan S. and G. Niederehe. 1974.showarticletitleOn signalling that it’s your turn to speak. J. Experimental Social Psychology 10 (1974), 234--247. Google ScholarGoogle ScholarCross RefCross Ref
  29. Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. 1974.showarticletitleA simplest systematics for the organisation of turn taking for conversation. Language 50 (1974), 696--735. Google ScholarGoogle ScholarCross RefCross Ref
  30. Senko Maynard. 1987.showarticletitleInteractional functions of a nonverbal sign:Head movement in Japanese dyadic casual conversation. Journal of Pragmatics 11 (1987), 589--606. Google ScholarGoogle ScholarCross RefCross Ref
  31. Senko Maynard. 1989.showarticletitleJapanese conversation:Self-contextualization through structure and interactional management. Norwood, New Jersey:Ablex Publishing Corporation (1989).Google ScholarGoogle Scholar

Index Terms

  1. Prediction of Next-Utterance Timing using Head Movement in Multi-Party Meetings

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      HAI '17: Proceedings of the 5th International Conference on Human Agent Interaction
      October 2017
      550 pages
      ISBN:9781450351133
      DOI:10.1145/3125739

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 October 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate121of404submissions,30%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader