Abstract
We propose in this paper a statistical model in the perspective of predicting listener’s feedbacks in a conversation. The first contribution of the paper is a study of the prediction of all feedbacks, including those in overlap with the speaker with a good accuracy. Existing model are good at predicting feedbacks during a pause, but reach a very low success level for all feedbacks. We give in this paper a first step towards this complex problem. The second contribution is a model predicting precisely the type of the feedback (generic vs. specific) as well as other specific features (valence expectation) useful in particular for generating feedbacks in dialogue systems. This work relies on an original corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allwood, J., Cerrato, L.: A study of gestural feedback expressions. In: First Nordic Symposium on Multimodal Communication, pp. 7–22. Copenhagen (2003)
Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P.: The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Lang. Resour. Eval. 41(3), 273–287 (2007)
Amoyal, M., Priego-Valverde, B., Rauzy, S.: PACO : a corpus to analyze the impact of common ground in spontaneous face-to-face interaction. In: LREC procs (2020)
Bavelas, J., Cates, L., Johnson, T.: Listeners as co-narrators. J. Pers. Soc. Psychol. 79(6), 941 (2000)
Beňuš, Š, Gravano, A., Hirschberg, J.: Pragmatic aspects of temporal accommodation in turn-taking. J. Pragmatics 43(12), 3001–3027 (2011)
Bertrand, R., Espesser, R.: Co-narration in French conversation storytelling: a quantitative insight. J. Pragmatics 111, 33–53 (2017)
Bertrand, R., Ferré, G., Blache, P., Espesser, R., Rauzy, S.: Backchannels revisited from a multimodal perspective. In: Auditory-Visual Speech Processing (2017)
Bigi, B.: SPPAS: a tool for the phonetic segmentations of speech. In: The Eighth International conference on Language Resources and Evaluation, pp. 1748–1755 (2012)
Blache, P., Abderrahmane, M., Rauzy, S., Ochs, M., Oufaida, H.: Two-level classification for dialogue act recognition in task-oriented dialogues. In: COLING 2020 (2020)
Blache, P., Abderrahmane, M., Rauzy, S., Bertrand, R.: An integrated model for predicting backchannel feedbacks. In: IVA Procs, pp. 1–3 (2020)
Bonin, P., Méot, A., Bugaiska, A.: Concreteness norms for 1,659 French words: relationships with other psycholinguistic variables and word recognition times. Behav. Res. Meth. 50(6), 2366–2387 (2018)
Brusco, P., Vidal, J., Beňuš, Š, Gravano, A.: A cross-linguistic analysis of the temporal dynamics of turn-taking cues using machine learning as a descriptive tool. Speech Commun. 125, 24–40 (2020)
Cathcart, N., Carletta, J., Klein, E.: A shallow model of backchannel continuers in spoken dialogue. In: European ACL, pp. 51–58. Citeseer (2003)
Chafe, W.: Discourse, Consciousness and Time. University of Chicago Press, Chicago (1994)
Chovil, N.: Discourse-oriented facial displays in conversation. Res. Lang. Soc. Interact. 25(1–4), 163–194 (1991)
Clark, H.: Using Language. Cambridge University Press, Cambridge (1996)
Ferré, G., Renaudier, S.: Unimodal and bimodal backchannels in conversational English. In: SEMDIAL Procs, pp. 20–30 (2017)
Gravano, A., Hirschberg, J.: Turn-taking cues in task-oriented dialogue. Comput. Speech Lang. 25(3), 601–634 (2011)
Horton, W.: Theories and Approaches to the Study of Conversation and Interactive Discourse (2017)
Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., Den, Y.: An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Lang. Speech 41(3–4), 295–321 (1998)
Meena, R., Skantze, G., Gustafson, J.: Data-driven models for timing feedback responses in a map task dialogue system. Comput. Speech Lang. textbf28(4), 903–922 (2014)
Morency, L.-P., de Kok, I., Gratch, J.: Predicting listener backchannels: a probabilistic multimodal approach. In: Prendinger, H., Lester, J., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 176–190. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85483-8_18
Morency, L.P., de Kok, I., Gratch, J.: A probabilistic multimodal approach for predicting listener backchannels. Auton. Agents Multi-Agent Syst. 20(1), 70–84 (2010)
Pickering, M.J., Garrod, S.: Understanding dialogue: language use and social interaction. Cambridge University Press, Cambridge (2021)
Poppe, R., Truong, K.P., Reidsma, D., Heylen, D.: Backchannel strategies for artificial listeners. In: Allbeck, J., Badler, N., Bickmore, T., Pelachaud, C., Safonova, A. (eds.) IVA 2010. LNCS (LNAI), vol. 6356, pp. 146–158. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15892-6_16
Portes, C., Bertrand, R.: Some cues about the interactional value of the \(\ll \)continuation\(\gg \) contour in French. In: IDP05 Procs, pp. 1–14 (2005)
Prévot, L., Gorisch, J., Bertrand, R.: A CUP of CoFee - a large collection of feedback utterances provided with communicative function annotations. In: LREC-2016 (2016)
Priego-Valverde, B., Bigi, B., Amoyal, M.: “Cheese!”: a corpus of Face-to-face French interactions. a case study for analyzing smiling and conversational humor. In: LREC, pp. 467–475 (2020)
Rauzy, S., Amoyal, M.: SMAD: a tool for automatically annotating the smile intensity along a video record. In: HRC2020 (2020)
Rauzy, S., Goujon, A.: Automatic annotation of facial actions from a video record: the case of eyebrows raising and frowning. In: WACAI 2018. Porquerolles, France (2018)
Rauzy, S., Montcheuil, G., Blache, P.: MarsaTag, a tagger for French written texts and speech transcriptions. In: Second Asia Pacific Corpus Linguistics Conference (2014)
Rossi, M., Di Cristo, A., Hirst, D., Martin, P., Nishinuma, Y.: L’intonation: de l’acoustique à la sémantique (1981)
Schegloff, E.: Discourse as an interactional achievement: some uses of “uh huh” and other things that come between sentences. In: Tannen, D. (ed.) Analyzing Discourse: Text and Talk. Georgetown University Press (1982)
Skantze, G.: Towards a general, continuous model of turn-taking in spoken dialogue using LSTM recurrent neural networks. In: SIGdial, pp. 220–230 (2017)
Stivers, T.: Stance, alignment, and affiliation during storytelling: when nodding is a token of affiliation. Rese. Lang. Soc. Interact. 41(1), 31–57 (2008)
Tolins, J., Tree, J.F.: Addressee backchannels steer narrative development. J. Pragmatics 70, 152–164 (2014)
Truong, K.P., Poppe, R., Kok, I.D., Heylen, D.: A multimodal analysis of vocal and visual backchannels in spontaneous dialogs. In: Interspeech (2011)
Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in English and Japanese. J. Pragmatics 32(8), 1177–1207 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Boudin, A., Bertrand, R., Rauzy, S., Ochs, M., Blache, P. (2021). A Multimodal Model for Predicting Conversational Feedbacks. In: Ekštein, K., Pártl, F., Konopík, M. (eds) Text, Speech, and Dialogue. TSD 2021. Lecture Notes in Computer Science(), vol 12848. Springer, Cham. https://doi.org/10.1007/978-3-030-83527-9_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-83527-9_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83526-2
Online ISBN: 978-3-030-83527-9
eBook Packages: Computer ScienceComputer Science (R0)