skip to main content
10.1145/3536221.3557029acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
short-paper

Interdisciplinary Corpus-based Approach for Exploring Multimodal Conversational Feedback

Authors Info & Claims
Published:07 November 2022Publication History

ABSTRACT

During spontaneous conversation, interlocutors have three possible actions: speak, be silent or produce feedback. In order to better understand the mechanisms that render spontaneous interactions successful, this PhD research focuses on conversational feedback. It is the reactions/responses produced by an interlocutor in a listening position. Feedback is a phenomenon of deep importance for the quality of the interaction. It allows interlocutors to share relevant information about understanding, establishment/upgrading of the common ground, engagement and shared representations. The objective of the PhD is to propose a multimodal model of conversational feedback. The methodological approach is interdisciplinary, combining a corpus analysis, based on machine learning enhanced by a linguistic interpretation. The resulting model will be evaluated through its integration in an Embodied Conversational Agent (ECA) with perspective studies.

References

  1. Jens Allwood and Loredana Cerrato. 2003. A study of gestural feedback expressions. In First nordic symposium on multimodal communication. Copenhagen, 7–22.Google ScholarGoogle Scholar
  2. Janet B Bavelas, Linda Coates, and Trudy Johnson. 2000. Listeners as co-narrators.Journal of personality and social psychology 79, 6(2000), 941.Google ScholarGoogle Scholar
  3. Roxane Bertrand. 2021. Linguistique Interactionnelle: du Corpus à l’Expérimentation. Ph. D. Dissertation. Aix Marseille Université.Google ScholarGoogle Scholar
  4. Roxane Bertrand and Robert Espesser. 2017. Co-narration in French conversation storytelling: A quantitative insight. Journal of Pragmatics 111 (2017), 33–53.Google ScholarGoogle ScholarCross RefCross Ref
  5. Roxane Bertrand, Gaëlle Ferré, Philippe Blache, Robert Espesser, and Stéphane Rauzy. 2007. Backchannels revisited from a multimodal perspective. In Auditory-visual Speech Processing. 1–5.Google ScholarGoogle Scholar
  6. Auriane Boudin, Roxane Bertrand, Magalie Ochs, Philippe Blache, and Stéphane Rauzy. 2022. Are you Smiling When I am Speaking?. In LREC 2022 Workshop Language Resources and Evaluation Conference 20-25 June 2022. 6.Google ScholarGoogle Scholar
  7. Auriane Boudin, Roxane Bertrand, Stéphane Rauzy, and Philippe Blache. 2022. A Multimodal Model for Predicting Feedback Position and Type During Conversation. (2022), 37 pages. Under review.Google ScholarGoogle Scholar
  8. Auriane Boudin, Roxane Bertrand, Stéphane Rauzy, Magalie Ochs, and Philippe Blache. 2021. A Multimodal Model for Predicting Conversational Feedbacks. In International Conference on Text, Speech, and Dialogue. Springer, 537–549.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Pablo Brusco, Jazmín Vidal, Štefan Beňuš, and Agustín Gravano. 2020. A cross-linguistic analysis of the temporal dynamics of turn-taking cues using machine learning as a descriptive tool. Speech Communication 125(2020), 24–40.Google ScholarGoogle ScholarCross RefCross Ref
  10. Nicola Cathcart, Jean Carletta, and Ewan Klein. 2003. A shallow model of backchannel continuers in spoken dialogue. In European ACL. Citeseer, 51–58.Google ScholarGoogle Scholar
  11. Herbert H Clark. 1996. Using language. Cambridge university press.Google ScholarGoogle Scholar
  12. Iwan De Kok, Derya Ozkan, Dirk Heylen, and Louis-Philippe Morency. 2010. Learning and evaluating response prediction models using parallel listener consensus. In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction. 1–8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gaëlle Ferré and Suzanne Renaudier. 2017. Unimodal and bimodal backchannels in conversational english. In SEMDIAL 2017. 27–37.Google ScholarGoogle ScholarCross RefCross Ref
  14. Shinya Fujie, Kenta Fukushima, and Tetsunori Kobayashi. 2004. A conversation robot with back-channel feedback function based on linguistic and nonlinguistic information. In Proc. ICARA Int. Conference on Autonomous Robots and Agents. Citeseer, 379–384.Google ScholarGoogle Scholar
  15. Nadine Glas and Catherine Pelachaud. 2015. Definitions of engagement in human-agent interaction. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 944–949.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Agustín Gravano and Julia Hirschberg. 2011. Turn-taking cues in task-oriented dialogue. Computer Speech & Language 25, 3 (2011), 601–634.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. William S Horton. 2017. Theories and approaches to the study of conversation and interactive discourse. In The Routledge handbook of discourse processes. Routledge, 22–68.Google ScholarGoogle Scholar
  18. Ryo Ishii, Xutong Ren, Michal Muszynski, and Louis-Philippe Morency. 2021. Multimodal and Multitask Approach to Listener’s Backchannel Prediction: Can Prediction of Turn-changing and Turn-management Willingness Improve Backchannel Modeling?. In Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents. 131–138.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jin Yea Jang, San Kim, Minyoung Jung, Saim Shin, and Gahgene Gweon. 2021. BPM_MT: Enhanced Backchannel Prediction Model using Multi-Task Learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3447–3452.Google ScholarGoogle ScholarCross RefCross Ref
  20. Tatsuya Kawahara, Takashi Yamaguchi, Koji Inoue, Katsuya Takanashi, and Nigel G Ward. 2016. Prediction and Generation of Backchannel Form for Attentive Listening Systems.. In Interspeech. 2890–2894.Google ScholarGoogle Scholar
  21. Norihide Kitaoka, Masashi Takeuchi, Ryota Nishimura, and Seiichi Nakagawa. 2006. Response timing detection using prosodic and linguistic information for human-friendly spoken dialog systems. Information and Media Technologies 1, 1 (2006), 296–304.Google ScholarGoogle Scholar
  22. Hanae Koiso, Yasuo Horiuchi, Syun Tutiya, Akira Ichikawa, and Yasuharu Den. 1998. An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Language and speech 41, 3-4 (1998), 295–321.Google ScholarGoogle Scholar
  23. Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. 2014. Data-driven models for timing feedback responses in a Map Task dialogue system. Computer Speech & Language 28, 4 (2014), 903–922.Google ScholarGoogle ScholarCross RefCross Ref
  24. Louis-Philippe Morency, Iwan de Kok, and Jonathan Gratch. 2010. A probabilistic multimodal approach for predicting listener backchannels. Autonomous agents and multi-agent systems 20, 1 (2010), 70–84.Google ScholarGoogle Scholar
  25. Markus Mueller, David Leuschner, Lars Briem, Maria Schmidt, Kevin Kilgour, Sebastian Stueker, and Alex Waibel. 2015. Using neural networks for data-driven backchannel prediction: A survey on input features and training techniques. In International conference on human-computer interaction. Springer, 329–340.Google ScholarGoogle ScholarCross RefCross Ref
  26. Yohei Okato, Keiji Kato, M Kamamoto, and Syuichi Itahashi. 1996. Insertion of interjectory response based on prosodic information. In Proceedings of IVTTA’96. Workshop on Interactive Voice Technology for Telecommunications Applications. IEEE, 85–88.Google ScholarGoogle ScholarCross RefCross Ref
  27. Daniel Ortega, Chia-Yu Li, and Ngoc Thang Vu. 2020. Oh, Jeez! or uh-huh? A listener-aware Backchannel predictor on ASR transcriptions. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8064–8068.Google ScholarGoogle ScholarCross RefCross Ref
  28. Derya Ozkan and Louis-Philippe Morency. 2010. Concensus of self-features for nonverbal behavior analysis. In International Workshop on Human Behavior Understanding. Springer, 75–86.Google ScholarGoogle ScholarCross RefCross Ref
  29. Derya Ozkan and Louis-Philippe Morency. 2012. Latent mixture of discriminative experts. IEEE transactions on multimedia 15, 2 (2012), 326–338.Google ScholarGoogle Scholar
  30. Martin Pickering and Simon Garrod. 2021. Understanding Dialogue. Cambridge University Press.Google ScholarGoogle Scholar
  31. Martin J Pickering and Simon Garrod. 2013. An integrated theory of language production and comprehension. Behavioral and brain sciences 36, 4 (2013), 329–347.Google ScholarGoogle Scholar
  32. Ronald Poppe, Khiet P Truong, and Dirk Heylen. 2011. Backchannels: Quantity, type and timing matters. In International workshop on intelligent virtual agents. Springer, 228–239.Google ScholarGoogle ScholarCross RefCross Ref
  33. Ronald Poppe, Khiet P Truong, Dennis Reidsma, and Dirk Heylen. 2010. Backchannel strategies for artificial listeners. In International Conference on Intelligent Virtual Agents. Springer, 146–158.Google ScholarGoogle ScholarCross RefCross Ref
  34. Laurent Prévot, Jan Gorisch, and Roxane Bertrand. 2016. A cup of cofee: A large collection of feedback utterances provided with communicative function annotations. (2016).Google ScholarGoogle Scholar
  35. Robin Ruede, Markus Müller, Sebastian Stüker, and Alex Waibel. 2019. Yeah, right, uh-huh: a deep learning backchannel predictor. In Advanced Social Interaction with Agents. Springer, 247–258.Google ScholarGoogle Scholar
  36. Emanuel A Schegloff. 1982. Discourse as an interactional achievement: Some uses of ‘uh huh’and other things that come between sentences. Analyzing discourse: Text and talk 71 (1982), 71–93.Google ScholarGoogle Scholar
  37. Tanya Stivers. 2008. Stance, alignment, and affiliation during storytelling: When nodding is a token of affiliation. Research on language and social interaction 41, 1 (2008), 31–57.Google ScholarGoogle Scholar
  38. Allison Terrell and Bilge Mutlu. 2012. A regression-based approach to modeling addressee backchannels. In Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 280–289.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jackson Tolins and Jean E Fox Tree. 2014. Addressee backchannels steer narrative development. Journal of Pragmatics 70(2014), 152–164.Google ScholarGoogle ScholarCross RefCross Ref
  40. Khiet P Truong, Ronald Poppe, and Dirk Heylen. 2010. A rule-based backchannel prediction model using pitch and pause information. In Eleventh Annual Conference of the International Speech Communication Association. Citeseer.Google ScholarGoogle ScholarCross RefCross Ref
  41. Nigel Ward. 1996. Using prosodic clues to decide when to produce back-channel utterances. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, Vol. 3. IEEE, 1728–1731.Google ScholarGoogle ScholarCross RefCross Ref
  42. Nigel Ward and Wataru Tsukahara. 2000. Prosodic features which cue back-channel responses in English and Japanese. Journal of pragmatics 32, 8 (2000), 1177–1207.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Interdisciplinary Corpus-based Approach for Exploring Multimodal Conversational Feedback

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction
      November 2022
      830 pages
      ISBN:9781450393904
      DOI:10.1145/3536221

      Copyright © 2022 ACM

      © 2022 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 November 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate453of1,080submissions,42%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format