ABSTRACT
During spontaneous conversation, interlocutors have three possible actions: speak, be silent or produce feedback. In order to better understand the mechanisms that render spontaneous interactions successful, this PhD research focuses on conversational feedback. It is the reactions/responses produced by an interlocutor in a listening position. Feedback is a phenomenon of deep importance for the quality of the interaction. It allows interlocutors to share relevant information about understanding, establishment/upgrading of the common ground, engagement and shared representations. The objective of the PhD is to propose a multimodal model of conversational feedback. The methodological approach is interdisciplinary, combining a corpus analysis, based on machine learning enhanced by a linguistic interpretation. The resulting model will be evaluated through its integration in an Embodied Conversational Agent (ECA) with perspective studies.
- Jens Allwood and Loredana Cerrato. 2003. A study of gestural feedback expressions. In First nordic symposium on multimodal communication. Copenhagen, 7–22.Google Scholar
- Janet B Bavelas, Linda Coates, and Trudy Johnson. 2000. Listeners as co-narrators.Journal of personality and social psychology 79, 6(2000), 941.Google Scholar
- Roxane Bertrand. 2021. Linguistique Interactionnelle: du Corpus à l’Expérimentation. Ph. D. Dissertation. Aix Marseille Université.Google Scholar
- Roxane Bertrand and Robert Espesser. 2017. Co-narration in French conversation storytelling: A quantitative insight. Journal of Pragmatics 111 (2017), 33–53.Google ScholarCross Ref
- Roxane Bertrand, Gaëlle Ferré, Philippe Blache, Robert Espesser, and Stéphane Rauzy. 2007. Backchannels revisited from a multimodal perspective. In Auditory-visual Speech Processing. 1–5.Google Scholar
- Auriane Boudin, Roxane Bertrand, Magalie Ochs, Philippe Blache, and Stéphane Rauzy. 2022. Are you Smiling When I am Speaking?. In LREC 2022 Workshop Language Resources and Evaluation Conference 20-25 June 2022. 6.Google Scholar
- Auriane Boudin, Roxane Bertrand, Stéphane Rauzy, and Philippe Blache. 2022. A Multimodal Model for Predicting Feedback Position and Type During Conversation. (2022), 37 pages. Under review.Google Scholar
- Auriane Boudin, Roxane Bertrand, Stéphane Rauzy, Magalie Ochs, and Philippe Blache. 2021. A Multimodal Model for Predicting Conversational Feedbacks. In International Conference on Text, Speech, and Dialogue. Springer, 537–549.Google ScholarDigital Library
- Pablo Brusco, Jazmín Vidal, Štefan Beňuš, and Agustín Gravano. 2020. A cross-linguistic analysis of the temporal dynamics of turn-taking cues using machine learning as a descriptive tool. Speech Communication 125(2020), 24–40.Google ScholarCross Ref
- Nicola Cathcart, Jean Carletta, and Ewan Klein. 2003. A shallow model of backchannel continuers in spoken dialogue. In European ACL. Citeseer, 51–58.Google Scholar
- Herbert H Clark. 1996. Using language. Cambridge university press.Google Scholar
- Iwan De Kok, Derya Ozkan, Dirk Heylen, and Louis-Philippe Morency. 2010. Learning and evaluating response prediction models using parallel listener consensus. In International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction. 1–8.Google ScholarDigital Library
- Gaëlle Ferré and Suzanne Renaudier. 2017. Unimodal and bimodal backchannels in conversational english. In SEMDIAL 2017. 27–37.Google ScholarCross Ref
- Shinya Fujie, Kenta Fukushima, and Tetsunori Kobayashi. 2004. A conversation robot with back-channel feedback function based on linguistic and nonlinguistic information. In Proc. ICARA Int. Conference on Autonomous Robots and Agents. Citeseer, 379–384.Google Scholar
- Nadine Glas and Catherine Pelachaud. 2015. Definitions of engagement in human-agent interaction. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 944–949.Google ScholarDigital Library
- Agustín Gravano and Julia Hirschberg. 2011. Turn-taking cues in task-oriented dialogue. Computer Speech & Language 25, 3 (2011), 601–634.Google ScholarDigital Library
- William S Horton. 2017. Theories and approaches to the study of conversation and interactive discourse. In The Routledge handbook of discourse processes. Routledge, 22–68.Google Scholar
- Ryo Ishii, Xutong Ren, Michal Muszynski, and Louis-Philippe Morency. 2021. Multimodal and Multitask Approach to Listener’s Backchannel Prediction: Can Prediction of Turn-changing and Turn-management Willingness Improve Backchannel Modeling?. In Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents. 131–138.Google ScholarDigital Library
- Jin Yea Jang, San Kim, Minyoung Jung, Saim Shin, and Gahgene Gweon. 2021. BPM_MT: Enhanced Backchannel Prediction Model using Multi-Task Learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 3447–3452.Google ScholarCross Ref
- Tatsuya Kawahara, Takashi Yamaguchi, Koji Inoue, Katsuya Takanashi, and Nigel G Ward. 2016. Prediction and Generation of Backchannel Form for Attentive Listening Systems.. In Interspeech. 2890–2894.Google Scholar
- Norihide Kitaoka, Masashi Takeuchi, Ryota Nishimura, and Seiichi Nakagawa. 2006. Response timing detection using prosodic and linguistic information for human-friendly spoken dialog systems. Information and Media Technologies 1, 1 (2006), 296–304.Google Scholar
- Hanae Koiso, Yasuo Horiuchi, Syun Tutiya, Akira Ichikawa, and Yasuharu Den. 1998. An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Language and speech 41, 3-4 (1998), 295–321.Google Scholar
- Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. 2014. Data-driven models for timing feedback responses in a Map Task dialogue system. Computer Speech & Language 28, 4 (2014), 903–922.Google ScholarCross Ref
- Louis-Philippe Morency, Iwan de Kok, and Jonathan Gratch. 2010. A probabilistic multimodal approach for predicting listener backchannels. Autonomous agents and multi-agent systems 20, 1 (2010), 70–84.Google Scholar
- Markus Mueller, David Leuschner, Lars Briem, Maria Schmidt, Kevin Kilgour, Sebastian Stueker, and Alex Waibel. 2015. Using neural networks for data-driven backchannel prediction: A survey on input features and training techniques. In International conference on human-computer interaction. Springer, 329–340.Google ScholarCross Ref
- Yohei Okato, Keiji Kato, M Kamamoto, and Syuichi Itahashi. 1996. Insertion of interjectory response based on prosodic information. In Proceedings of IVTTA’96. Workshop on Interactive Voice Technology for Telecommunications Applications. IEEE, 85–88.Google ScholarCross Ref
- Daniel Ortega, Chia-Yu Li, and Ngoc Thang Vu. 2020. Oh, Jeez! or uh-huh? A listener-aware Backchannel predictor on ASR transcriptions. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8064–8068.Google ScholarCross Ref
- Derya Ozkan and Louis-Philippe Morency. 2010. Concensus of self-features for nonverbal behavior analysis. In International Workshop on Human Behavior Understanding. Springer, 75–86.Google ScholarCross Ref
- Derya Ozkan and Louis-Philippe Morency. 2012. Latent mixture of discriminative experts. IEEE transactions on multimedia 15, 2 (2012), 326–338.Google Scholar
- Martin Pickering and Simon Garrod. 2021. Understanding Dialogue. Cambridge University Press.Google Scholar
- Martin J Pickering and Simon Garrod. 2013. An integrated theory of language production and comprehension. Behavioral and brain sciences 36, 4 (2013), 329–347.Google Scholar
- Ronald Poppe, Khiet P Truong, and Dirk Heylen. 2011. Backchannels: Quantity, type and timing matters. In International workshop on intelligent virtual agents. Springer, 228–239.Google ScholarCross Ref
- Ronald Poppe, Khiet P Truong, Dennis Reidsma, and Dirk Heylen. 2010. Backchannel strategies for artificial listeners. In International Conference on Intelligent Virtual Agents. Springer, 146–158.Google ScholarCross Ref
- Laurent Prévot, Jan Gorisch, and Roxane Bertrand. 2016. A cup of cofee: A large collection of feedback utterances provided with communicative function annotations. (2016).Google Scholar
- Robin Ruede, Markus Müller, Sebastian Stüker, and Alex Waibel. 2019. Yeah, right, uh-huh: a deep learning backchannel predictor. In Advanced Social Interaction with Agents. Springer, 247–258.Google Scholar
- Emanuel A Schegloff. 1982. Discourse as an interactional achievement: Some uses of ‘uh huh’and other things that come between sentences. Analyzing discourse: Text and talk 71 (1982), 71–93.Google Scholar
- Tanya Stivers. 2008. Stance, alignment, and affiliation during storytelling: When nodding is a token of affiliation. Research on language and social interaction 41, 1 (2008), 31–57.Google Scholar
- Allison Terrell and Bilge Mutlu. 2012. A regression-based approach to modeling addressee backchannels. In Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 280–289.Google ScholarDigital Library
- Jackson Tolins and Jean E Fox Tree. 2014. Addressee backchannels steer narrative development. Journal of Pragmatics 70(2014), 152–164.Google ScholarCross Ref
- Khiet P Truong, Ronald Poppe, and Dirk Heylen. 2010. A rule-based backchannel prediction model using pitch and pause information. In Eleventh Annual Conference of the International Speech Communication Association. Citeseer.Google ScholarCross Ref
- Nigel Ward. 1996. Using prosodic clues to decide when to produce back-channel utterances. In Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, Vol. 3. IEEE, 1728–1731.Google ScholarCross Ref
- Nigel Ward and Wataru Tsukahara. 2000. Prosodic features which cue back-channel responses in English and Japanese. Journal of pragmatics 32, 8 (2000), 1177–1207.Google ScholarCross Ref
Index Terms
- Interdisciplinary Corpus-based Approach for Exploring Multimodal Conversational Feedback
Recommendations
A Multimodal Model for Predicting Conversational Feedbacks
Text, Speech, and DialogueAbstractWe propose in this paper a statistical model in the perspective of predicting listener’s feedbacks in a conversation. The first contribution of the paper is a study of the prediction of all feedbacks, including those in overlap with the speaker ...
Multimodal spatial reference in mediated environments: users' preferences and the pragmatics of pointing and talking
CHI EA '06: CHI '06 Extended Abstracts on Human Factors in Computing SystemsThis paper describes the current results and future developments of a project on multimodal spatial reference in mediated environments. The database consists of video-recorded sessions, with 120 participants in three experimental designs, contrasting ...
Multimodal interaction systems: information and time features
Multimodal interaction systems combine visual information (involving images, text, sketches and so on) with voice, gestures and other modalities to provide flexible and powerful dialogue approaches, enabling users to choose one or more of the multiple ...
Comments