skip to main content
research-article

Using Respiration to Predict Who Will Speak Next and When in Multiparty Meetings

Published: 03 August 2016 Publication History

Abstract

Techniques that use nonverbal behaviors to predict turn-changing situations—such as, in multiparty meetings, who the next speaker will be and when the next utterance will occur—have been receiving a lot of attention in recent research. To build a model for predicting these behaviors we conducted a research study to determine whether respiration could be effectively used as a basis for the prediction. Results of analyses of utterance and respiration data collected from participants in multiparty meetings reveal that the speaker takes a breath more quickly and deeply after the end of an utterance in turn-keeping than in turn-changing. They also indicate that the listener who will be the next speaker takes a bigger breath more quickly and deeply in turn-changing than the other listeners. On the basis of these results, we constructed and evaluated models for predicting the next speaker and the time of the next utterance in multiparty meetings. The results of the evaluation suggest that the characteristics of the speaker's inhalation right after an utterance unit—the points in time at which the inhalation starts and ends after the end of the utterance unit and the amplitude, slope, and duration of the inhalation phase—are effective for predicting the next speaker in multiparty meetings. They further suggest that the characteristics of listeners' inhalation—the points in time at which the inhalation starts and ends after the end of the utterance unit and the minimum and maximum inspiration, amplitude, and slope of the inhalation phase—are effective for predicting the next speaker. The start time and end time of the next speaker's inhalation are also useful for predicting the time of the next utterance in turn-changing.

References

[1]
Athos. 2014. http://www.mindmedia.info/CMS2014/products/systems/nexus-10-mkii.
[2]
Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2010. WEKA—Experiences with a Java open-source project. The Journal of Machine Learning Research 11 (2010), 2533--2541.
[3]
Lei Chen and Mary P. Harper. 2009. Multimodal floor control shift detection. In Proceedings of the International Conference on Multimodal Interaction. 15--22.
[4]
Iwan de Kok and Dirk Heylen. 2009. Multimodal end-of-turn prediction in multi-party meetings. In Proceedings of the International Conference on Multimodal Interaction. 91--98.
[5]
Alfred Dielmann, Giulia Garau, and Herv Bourlard. 2010. Floor holder detection and end of speaker turn prediction in meetings. In Proceedings of the Annual Conference on the International Speech Communication Association. 2306--2309.
[6]
Jens Edlund, Mattias Heldner, and Marcin Wodarczak. 2014. Catching wind of multiparty conversation. In Proceedings of Multimodal Corpora: Combining Applied and Basic Research Targets. 35--36.
[7]
Luciana Ferrer, Elizabeth Shriberg, and Andreas Stolcke. 2002. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human-computer dialog. In Proceedings of the Annual Conference on the International Speech Communication Association, Vol. 3. 2061--2064.
[8]
Takashi Fukuda, Osamu Ichikawa, and Masafumi Nishimura. 2011. Combining feature space discriminative training with long-term spectro-temporal features for noise-robust speech recognition. In Proceedings of the Annual Conference on the International Speech Communication Association. 229--232.
[9]
Daniel Gatica-Perez. 2006. Analyzing group interactions in conversations: A review. In Proceedings of the International Conference on Multisensor Fusion and Integration for Intelligent Systems. 41--46.
[10]
Ryo Ishii, Shiro Kumano, and Kazuhiro Otsuka. 2015. Predicting next speaker using head movement in multi-party meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 2319--2323.
[11]
Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, Masafumi Matsuda, and Junji Yamato. 2013. Predicting next speaker and timing from gaze transition patterns in multi-party meetings. In Proceedings of the International Conference on Multimodal Interaction. 79--86.
[12]
Ryo Ishii, Kauhiro Otsuka, Shiro Kumano, and Junji Yamamoto. 2016. Predicting of who will be the next speaker and when using gaze behavior in multiparty meetings. The ACM Transactions on Interactive Intelligent Systems 6, 1 (2016), 4.
[13]
Ryo Ishii, Kazuhiro Otsuka, Shiro Kumano, and Junji Yamato. 2014. Analysis and modeling of next speaking start timing based on gaze behavior in multi-party meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 694--698.
[14]
Kristiina Jokinen, Hirohisa Furukawa, Masafumi Nishida, and Seiichi Yamamoto. 2013. Gaze and turn-taking behavior in casual conversational interactions. The ACM Transactions on Interactive Intelligent Systems 3, 2 (2013), 12.
[15]
Natasa Jovanovic, Rieks op den Akker, and Anton Nijholt. 2006. Addressee identification in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics.
[16]
Tatsuya Kawahara, Takuma Iwatate, and Katsuya Takanashii. 2012. Prediction of turn-taking by combining prosodic and eye-gaze information in poster conversations. In Proceedings of the Annual Conference on the International Speech Communication Association.
[17]
S. Sathiya Keerthi, Shirish Krishnaj Shevade, Chiranjib Bhattacharyya, and Karuturi Radha Krishna Murthy. 2001. Improvements to Platt's SMO algorithm for SVM classifier design. Neural Computation 13, 3 (2001), 637--649.
[18]
Adam Kendon. 1967. Some functions of gaze direction in social interaction. ActaPsychologica 26 (1967), 22--63.
[19]
Hanae Koiso, Yasuo Horiuchi, Syun Tutiya, Akira Ichikawa, and Yasuharu Den. 1998. An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. In Language and Speech, Vol. 41. 295--321.
[20]
Kornel Laskowski, Jens Edlund, and Mattias Heldner. 2011. A single-port non-parametric model of turn-taking in multi-party conversation. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 5600--5603.
[21]
Gina-Anne Levow. 2005. Turn-taking in Mandarin dialogue: Interactions of tones and intonation. In Proceedings of the SIGHAN Workshop on Chinese Language Processing.
[22]
David H. McFarland. 2001. Respiratory markers of conversational interaction. Journal of Speech, Language and Hearing Research 44, 1 (2001), 128--143.
[23]
MIND MEDIA. 2014. NeXus-10 MARKII. http://www.mindmedia.info/CMS2014/products/systems/nexus-10-mkii.
[24]
Kazuhiro Otsuka. 2011. Conversational scene analysis. IEEE Signal Processing Magazine 28 (2011), 127--131.
[25]
Kazuhiro Otsuka, Shoko Araki, Dan Mikami, Kentaro Ishizuka, Masakiyo Fujimoto, and Junji Yamato. 2009. Realtime meeting analysis and 3D meeting viewer based on omnidirectional multimodal sensors. In Proceedings of the International Conference on Multimodal Interfaces and Workshop on Machine Learning for Multimodal Interaction. 219--220.
[26]
Philips. 2014. Vital Signs Camera. http://www.vitalsignscamera.com/.
[27]
Amelie Rochet-Capellan, Gerard Bailly, and Susanne Fuchs. 2014. Is breathing sensitive to the communication partner? In Speech Prosody. 613--617.
[28]
Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. 1974. A simplest systematics for the organisation of turn taking for conversation. Language 50 (1974), 696--735.
[29]
David Schlangen. 2006. From reaction to prediction experiments with computational models of turn-taking. In Proceedings of the Annual Conference on the International Speech Communication Association. 17--21.
[30]
Alex J. Smola and Bernhard Scholkopf. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3 (2004), 199--222.
[31]
Alison L. Winkworth, Pamela J. Davis, Roger D. Adams, and Elizabeth Ellis. 1995. Breathing patterns during spontaneous speech. Journal of Speech, Language and Hearing Research 38, 1 (1995), 124--144.
[32]
Junyi Xia and Ramon Alfredo Carvalho Siochi. 2012. A real-time respiratory motion monitoring system using KINECT: Proof of concept. Medical Physics 39, 5 (2012), 2682--2685.

Cited By

View all
  • (2024)Sensing the Intentions to Speak in VR Group DiscussionsSensors10.3390/s2402036224:2(362)Online publication date: 7-Jan-2024
  • (2024)Coordination of Speaking Opportunities in Virtual Reality: Analyzing Interaction Dynamics and Context-Aware StrategiesApplied Sciences10.3390/app14241207114:24(12071)Online publication date: 23-Dec-2024
  • (2024)Respiration-enhanced Human-Robot CommunicationCompanion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610978.3640707(813-816)Online publication date: 11-Mar-2024
  • Show More Cited By

Index Terms

  1. Using Respiration to Predict Who Will Speak Next and When in Multiparty Meetings

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Interactive Intelligent Systems
    ACM Transactions on Interactive Intelligent Systems  Volume 6, Issue 2
    Regular Articles, Special Issue on Highlights of IUI 2015 (Part 2 of 2) and Special Issue on Highlights of ICMI 2014 (Part 1 of 2)
    August 2016
    282 pages
    ISSN:2160-6455
    EISSN:2160-6463
    DOI:10.1145/2974721
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 August 2016
    Accepted: 01 December 2015
    Revised: 01 December 2015
    Received: 01 June 2015
    Published in TIIS Volume 6, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Turn-changing
    2. multiparty meetings
    3. next speaker prediction
    4. next-utterance timing prediction
    5. respiration

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Sensing the Intentions to Speak in VR Group DiscussionsSensors10.3390/s2402036224:2(362)Online publication date: 7-Jan-2024
    • (2024)Coordination of Speaking Opportunities in Virtual Reality: Analyzing Interaction Dynamics and Context-Aware StrategiesApplied Sciences10.3390/app14241207114:24(12071)Online publication date: 23-Dec-2024
    • (2024)Respiration-enhanced Human-Robot CommunicationCompanion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction10.1145/3610978.3640707(813-816)Online publication date: 11-Mar-2024
    • (2024)Breathing and Speech Adaptation: Do Speakers Adapt Toward a Confederate Talking Under Physical Effort?Journal of Speech, Language, and Hearing Research10.1044/2023_JSLHR-23-0011367:10S(3914-3930)Online publication date: 24-Oct-2024
    • (2023)Enhancing Human–Robot Collaboration through a Multi-Module Interaction Framework with Sensor Fusion: Object Recognition, Verbal Communication, User of Interest Detection, Gesture and Gaze RecognitionSensors10.3390/s2313579823:13(5798)Online publication date: 21-Jun-2023
    • (2023)Video-based Respiratory Waveform Estimation in Dialogue: A Novel Task and Dataset for Human-Machine InteractionProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614154(649-660)Online publication date: 9-Oct-2023
    • (2023)Multimodal Turn Analysis and Prediction for Multi-party ConversationsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614139(436-444)Online publication date: 9-Oct-2023
    • (2023)Are we in sync during turn switch?2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG57933.2023.10042799(1-4)Online publication date: 5-Jan-2023
    • (2023)Is Turn-Shift Distinguishable with Synchrony?Artificial Intelligence in HCI10.1007/978-3-031-35894-4_32(419-432)Online publication date: 23-Jul-2023
    • (2022)Trimodal prediction of speaking and listening willingness to help improve turn-changing modelingFrontiers in Psychology10.3389/fpsyg.2022.77454713Online publication date: 18-Oct-2022
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media