Skip to main content

A Multimodal Model for Predicting Conversational Feedbacks

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12848))

Included in the following conference series:

Abstract

We propose in this paper a statistical model in the perspective of predicting listener’s feedbacks in a conversation. The first contribution of the paper is a study of the prediction of all feedbacks, including those in overlap with the speaker with a good accuracy. Existing model are good at predicting feedbacks during a pause, but reach a very low success level for all feedbacks. We give in this paper a first step towards this complex problem. The second contribution is a model predicting precisely the type of the feedback (generic vs. specific) as well as other specific features (valence expectation) useful in particular for generating feedbacks in dialogue systems. This work relies on an original corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Allwood, J., Cerrato, L.: A study of gestural feedback expressions. In: First Nordic Symposium on Multimodal Communication, pp. 7–22. Copenhagen (2003)

    Google Scholar 

  2. Allwood, J., Cerrato, L., Jokinen, K., Navarretta, C., Paggio, P.: The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena. Lang. Resour. Eval. 41(3), 273–287 (2007)

    Article  Google Scholar 

  3. Amoyal, M., Priego-Valverde, B., Rauzy, S.: PACO : a corpus to analyze the impact of common ground in spontaneous face-to-face interaction. In: LREC procs (2020)

    Google Scholar 

  4. Bavelas, J., Cates, L., Johnson, T.: Listeners as co-narrators. J. Pers. Soc. Psychol. 79(6), 941 (2000)

    Google Scholar 

  5. Beňuš, Š, Gravano, A., Hirschberg, J.: Pragmatic aspects of temporal accommodation in turn-taking. J. Pragmatics 43(12), 3001–3027 (2011)

    Article  Google Scholar 

  6. Bertrand, R., Espesser, R.: Co-narration in French conversation storytelling: a quantitative insight. J. Pragmatics 111, 33–53 (2017)

    Google Scholar 

  7. Bertrand, R., Ferré, G., Blache, P., Espesser, R., Rauzy, S.: Backchannels revisited from a multimodal perspective. In: Auditory-Visual Speech Processing (2017)

    Google Scholar 

  8. Bigi, B.: SPPAS: a tool for the phonetic segmentations of speech. In: The Eighth International conference on Language Resources and Evaluation, pp. 1748–1755 (2012)

    Google Scholar 

  9. Blache, P., Abderrahmane, M., Rauzy, S., Ochs, M., Oufaida, H.: Two-level classification for dialogue act recognition in task-oriented dialogues. In: COLING 2020 (2020)

    Google Scholar 

  10. Blache, P., Abderrahmane, M., Rauzy, S., Bertrand, R.: An integrated model for predicting backchannel feedbacks. In: IVA Procs, pp. 1–3 (2020)

    Google Scholar 

  11. Bonin, P., Méot, A., Bugaiska, A.: Concreteness norms for 1,659 French words: relationships with other psycholinguistic variables and word recognition times. Behav. Res. Meth. 50(6), 2366–2387 (2018)

    Article  Google Scholar 

  12. Brusco, P., Vidal, J., Beňuš, Š, Gravano, A.: A cross-linguistic analysis of the temporal dynamics of turn-taking cues using machine learning as a descriptive tool. Speech Commun. 125, 24–40 (2020)

    Article  Google Scholar 

  13. Cathcart, N., Carletta, J., Klein, E.: A shallow model of backchannel continuers in spoken dialogue. In: European ACL, pp. 51–58. Citeseer (2003)

    Google Scholar 

  14. Chafe, W.: Discourse, Consciousness and Time. University of Chicago Press, Chicago (1994)

    Google Scholar 

  15. Chovil, N.: Discourse-oriented facial displays in conversation. Res. Lang. Soc. Interact. 25(1–4), 163–194 (1991)

    Article  Google Scholar 

  16. Clark, H.: Using Language. Cambridge University Press, Cambridge (1996)

    Google Scholar 

  17. Ferré, G., Renaudier, S.: Unimodal and bimodal backchannels in conversational English. In: SEMDIAL Procs, pp. 20–30 (2017)

    Google Scholar 

  18. Gravano, A., Hirschberg, J.: Turn-taking cues in task-oriented dialogue. Comput. Speech Lang. 25(3), 601–634 (2011)

    Google Scholar 

  19. Horton, W.: Theories and Approaches to the Study of Conversation and Interactive Discourse (2017)

    Google Scholar 

  20. Koiso, H., Horiuchi, Y., Tutiya, S., Ichikawa, A., Den, Y.: An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese map task dialogs. Lang. Speech 41(3–4), 295–321 (1998)

    Article  Google Scholar 

  21. Meena, R., Skantze, G., Gustafson, J.: Data-driven models for timing feedback responses in a map task dialogue system. Comput. Speech Lang. textbf28(4), 903–922 (2014)

    Google Scholar 

  22. Morency, L.-P., de Kok, I., Gratch, J.: Predicting listener backchannels: a probabilistic multimodal approach. In: Prendinger, H., Lester, J., Ishizuka, M. (eds.) IVA 2008. LNCS (LNAI), vol. 5208, pp. 176–190. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85483-8_18

    Chapter  Google Scholar 

  23. Morency, L.P., de Kok, I., Gratch, J.: A probabilistic multimodal approach for predicting listener backchannels. Auton. Agents Multi-Agent Syst. 20(1), 70–84 (2010)

    Article  Google Scholar 

  24. Pickering, M.J., Garrod, S.: Understanding dialogue: language use and social interaction. Cambridge University Press, Cambridge (2021)

    Google Scholar 

  25. Poppe, R., Truong, K.P., Reidsma, D., Heylen, D.: Backchannel strategies for artificial listeners. In: Allbeck, J., Badler, N., Bickmore, T., Pelachaud, C., Safonova, A. (eds.) IVA 2010. LNCS (LNAI), vol. 6356, pp. 146–158. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15892-6_16

    Chapter  Google Scholar 

  26. Portes, C., Bertrand, R.: Some cues about the interactional value of the \(\ll \)continuation\(\gg \) contour in French. In: IDP05 Procs, pp. 1–14 (2005)

    Google Scholar 

  27. Prévot, L., Gorisch, J., Bertrand, R.: A CUP of CoFee - a large collection of feedback utterances provided with communicative function annotations. In: LREC-2016 (2016)

    Google Scholar 

  28. Priego-Valverde, B., Bigi, B., Amoyal, M.: “Cheese!”: a corpus of Face-to-face French interactions. a case study for analyzing smiling and conversational humor. In: LREC, pp. 467–475 (2020)

    Google Scholar 

  29. Rauzy, S., Amoyal, M.: SMAD: a tool for automatically annotating the smile intensity along a video record. In: HRC2020 (2020)

    Google Scholar 

  30. Rauzy, S., Goujon, A.: Automatic annotation of facial actions from a video record: the case of eyebrows raising and frowning. In: WACAI 2018. Porquerolles, France (2018)

    Google Scholar 

  31. Rauzy, S., Montcheuil, G., Blache, P.: MarsaTag, a tagger for French written texts and speech transcriptions. In: Second Asia Pacific Corpus Linguistics Conference (2014)

    Google Scholar 

  32. Rossi, M., Di Cristo, A., Hirst, D., Martin, P., Nishinuma, Y.: L’intonation: de l’acoustique à la sémantique (1981)

    Google Scholar 

  33. Schegloff, E.: Discourse as an interactional achievement: some uses of “uh huh” and other things that come between sentences. In: Tannen, D. (ed.) Analyzing Discourse: Text and Talk. Georgetown University Press (1982)

    Google Scholar 

  34. Skantze, G.: Towards a general, continuous model of turn-taking in spoken dialogue using LSTM recurrent neural networks. In: SIGdial, pp. 220–230 (2017)

    Google Scholar 

  35. Stivers, T.: Stance, alignment, and affiliation during storytelling: when nodding is a token of affiliation. Rese. Lang. Soc. Interact. 41(1), 31–57 (2008)

    Article  Google Scholar 

  36. Tolins, J., Tree, J.F.: Addressee backchannels steer narrative development. J. Pragmatics 70, 152–164 (2014)

    Article  Google Scholar 

  37. Truong, K.P., Poppe, R., Kok, I.D., Heylen, D.: A multimodal analysis of vocal and visual backchannels in spontaneous dialogs. In: Interspeech (2011)

    Google Scholar 

  38. Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in English and Japanese. J. Pragmatics 32(8), 1177–1207 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Blache .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Boudin, A., Bertrand, R., Rauzy, S., Ochs, M., Blache, P. (2021). A Multimodal Model for Predicting Conversational Feedbacks. In: Ekštein, K., Pártl, F., Konopík, M. (eds) Text, Speech, and Dialogue. TSD 2021. Lecture Notes in Computer Science(), vol 12848. Springer, Cham. https://doi.org/10.1007/978-3-030-83527-9_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-83527-9_46

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-83526-2

  • Online ISBN: 978-3-030-83527-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics