skip to main content
10.1145/2522848.2522893acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Predicting speech overlaps from speech tokens and co-occurring body behaviours in dyadic conversations

Published: 09 December 2013 Publication History

Abstract

This paper deals with speech overlaps in dyadic video record-ed spontaneous conversations. Speech overlaps are quite common in everyday conversations and it is therefore important to study their occurrences in different communicative situations and settings and to model them in applied communicative systems.
In the present work, we wanted to investigate the frequency and use of speech overlaps in a multimodally annotated corpus of first encounters. Speech overlaps were automatically tagged and a Bayesian Network learner was trained on the multimodal annotations in order to determine to which extent overlaps can be predicted so they can be dealt with in conversational devices and to investigate the relation between overlaps, speech tokens and co-occurring body behaviours. The annotations comprise shape and functions of head movements, facial expressions and body postures.
23% of the speech tokens and 90% of the spoken contributions of the first encounters are overlapping. The best classification results were obtained training the classifier on multimodal behaviours (speech and co-occurring head movements, facial expressions and body postures) which surround-ed the overlaps. Training the classifier on all speech tokens also gave good results while adding the shape of co-occurring body behaviours to them did not affect the results. Thus, the behaviours of the conversation participants does not change when there is a speech overlap. This could indicate that most of the overlaps in the first encounters are non competitive.

References

[1]
J. Allwood, L. Cerrato, K. Jokinen, C. Navarretta, and P. Paggio. The MUMIN Coding Scheme for the Annotation of Feedback, Turn Management and Sequencing. Multimodal Corpora for Modelling Human Multimodal Behaviour. Special Issue of the International Journal of Language Resources and Evaluation, 41(3--4):273--287, 2007.
[2]
M. Argyle and M. Cook. Gaze and mutual gaze. Cambridge University Press, Cambridge, UK, 1976.
[3]
N. Campbell. An audio-visual approach to measuring discourse synchrony in m ultimodal conversation data. In Proceedings of Interspeech 2009, pages 12--14, 2009.
[4]
N. Campbell and S. Scherer. Comparing Measures of Synchrony and Alignment in Dialogue Speech Timing with Respect to Turn-Taking Activity. In Proceedings of Interspeech, pages 2546--2549, 2010.
[5]
O. Çetin and E. Shriberg. Analysis of Overlaps in Meetings by Dialog Factors, Hot Spots, Speakers, and Collection Site: Insights for Automatic Speech Recognition. In Proceedings of INTERSPEECH 2006- ICLSP, pages 293--296, Pittsburgh, 2006.
[6]
S. J. Cowley. Of Timing, Turn-Taking, and Conversations. Journal of Psycholinguistic Research, 27(5):541--571, 1998.
[7]
W. Daelemans, V. Hoste, F. D. Meulder, and B. Naudts. Combined Optimization of Feature Selection and Algorithm Parameter Interaction in Machine Learning of Language. In Proceedings of the 14th European Conference on Machine Learning (ECML-2003), pages 84--95, Cavtat-Dubrovnik, Croatia, 2003.
[8]
S. Duncan. Some Signals and Rules for Taking Speaking Turns in Conversations. Journal of Personality and Social Psychology, 23(2):283--292, 1972.
[9]
S. J. Duncan and D. Fiske. Face-to-face interaction. Erlbaum, Hillsdale, NJ, 1977.
[10]
U. Hadar, T. Steiner, and F. C. Rose. The Relationship Between Head Movements and Speech Dysfluencies. Language and Speech, 27(4):333--342, 1984.
[11]
U. Hadar, T. Steiner, and F. C. Rose. The timing of shifts of head postures during conversation. Human Movement Science, 3(3):237--245, 1984.
[12]
M. Heldner and J. Edlund. Pauses, gaps and overlaps in conversations. Journal of Phonetics, 38:555--568, 2010.
[13]
K. Jokinen. Turn taking, Utterance Density, and Gaze Patterns as Cues to Conversational Activity. In Proceedings of ICMI-MMI, Alicante, Spain, November 2011.
[14]
D. Jurafsky, R. Ranganath, and D. A. McFarland. Extracting Social Meaning: Identifying Interactional Style in Spoken Conversation. In Proceedings of North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT2009), pages 89--115, Boulder, CO, 2009.
[15]
A. Kendon. Some functions of gaze-direction in social interaction. Acta Psychologica, 26:22--63, 1967.
[16]
M. Kipp. Gesture Generation by Imitation - From Human Behavior to Computer Character Animation. PhD thesis, Saarland University, Saarbruecken, Germany, Boca Raton, Florida, dissertation.com, 2004.
[17]
E. Kurtić, G. J. Brown, and B. Wells. Resources for turn competition in overlapping talk. Speech Communication, 55(5):721 -- 743, 2013.
[18]
C. Navarretta. Automatic recognition of the function of third-person singular pronouns in texts and spoken data. In A. B. S. Lalitha Devi and R. Mitkov, editors, Anaphora Processing and Applications. 7th Discourse Anaphora and Anaphor Resolution Colloquium - DAARC 2009, Goa, India Proceedings, volume 5847 of LNAI, pages 15--28. Springer Verlag., Berlin/Heidelberg, November 2009.
[19]
C. Navarretta and P. Paggio. Classification of Feedback Expressions in Multimodal Data.\balancecolumns In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), pages 318--324, Upssala, Sweden, Juli 2010.
[20]
C. Navarretta and P. Paggio. Classifying Multimodal Turn Management in Danish Dyadic First Encounters. In Proceedings of the 19th Nordic Conference of Computational Linguistics NoDaLiDa 2013, pages 133--146, 2013.
[21]
D. O'Connell, S. Kowal, and E. Kaltenbacher. Turn-Taking: A Critical Analysis of the Research Tradition. Journal of Psycholinguistic Research, 19(6):345--373, 1990.
[22]
C. Oertel, M. Wlodarczak, A. Tarasov, N. Campbell, and P. Wagner. Context cues for classification of competitive and collaborative overlaps. In Speech Prosody conference, 2012.
[23]
P. Paggio and C. Navarretta. Head Movements, Facial Expressions and Feedback in Danish First Encounters Interactions: a Culture-Specific Analysis. In C. Stephanidis, editor, Universal Access in Human-Computer Interaction. Users Diversity. Proceedings of 6th International Conference, UAHCI 2011, Held as Part of HCI International 2011, pages 583--590, Orlando, FL, USA, July 2011. Springer.
[24]
R. Ranganath, D. Jurafsky, and D. McFarland. It's not you, it's me: detecting flirting and its misperception in speed-dates. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1, EMNLP '09, pages 334--342, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.
[25]
R. Ranganath, D. Jurafsky, and D. A. McFarland. Detecting friendly, flirtatious, awkward, and assertive speech in speed-dates. Computer Speech and Language, 27(1):89--115, 2013.
[26]
H. Sacks, E. Schegloff, and G. Jefferson. A simplest systematics for the organization of turn-taking for conversation. Language, 50(4):696--735, 1974.
[27]
E. A. Schegloff. Overlapping talk and the organization of turn-taking for conversation. Language in Society, 29:1--63, 2000.
[28]
G. Webb, J. Boughton, F. Zheng, K. Ting, and H. Salem. Learning by extrapolation from marginal to full-multivariate probability distributions: Decreasingly naive Bayesian classification. Machine Learning, 86(2):233--272, 2012.
[29]
G. I. Webb, J. Boughton, and Z. Wang. Not So Naive Bayes: Aggregating One-Dependence Estimators. Machine Learning, 58(1):5--24, 2005.
[30]
I. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, second edition, 2005.
[31]
V. Yngve. On getting a word in edgewise. In Papers from the sixth regional meeting of the Chicago Linguistic Society, pages 567--578, 1970.

Cited By

View all
  • (2018)The Danish NOMCO corpusLanguage Resources and Evaluation10.1007/s10579-016-9371-651:2(463-494)Online publication date: 17-Dec-2018
  • (2014)Alignment of communicative behaviors and familiarity in first encounters2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom)10.1109/CogInfoCom.2014.7020443(185-190)Online publication date: Nov-2014

Index Terms

  1. Predicting speech overlaps from speech tokens and co-occurring body behaviours in dyadic conversations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction
    December 2013
    630 pages
    ISBN:9781450321297
    DOI:10.1145/2522848
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 December 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. machine learning
    2. multimodal behaviours
    3. multimodal corpora
    4. speech overlaps

    Qualifiers

    • Poster

    Conference

    ICMI '13
    Sponsor:

    Acceptance Rates

    ICMI '13 Paper Acceptance Rate 49 of 133 submissions, 37%;
    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)The Danish NOMCO corpusLanguage Resources and Evaluation10.1007/s10579-016-9371-651:2(463-494)Online publication date: 17-Dec-2018
    • (2014)Alignment of communicative behaviors and familiarity in first encounters2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom)10.1109/CogInfoCom.2014.7020443(185-190)Online publication date: Nov-2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media