Skip to main content

Automatic Detection of Backchannels in Russian Dialogue Speech

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2020)

Abstract

This paper deals with acoustic properties of backchannels – those turns within a dialogue which do not convey information but signify that the speaker is listening to his/her interlocutor (uh-huh, hm etc.). The research is based on a Russian corpus of dialogue speech, SibLing, a part of which (339 min of speech) was manually segmented into backchannels and non-backchannels. Then, a number of acoustic parameters was calculated: duration, intensity, fundamental frequency, and pause duration. Our data have shown that in Russian speech backchannels are shorter and have lower loudness and pitch than non-backchannels. After that, two classifiers were tested: CART and SVM. The highest efficiency was achieved using SVM (F\(_1\) = 0.651) and the following feature set: duration, maximum fundamental frequency, melodic slope. The most valuable feature was duration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bailly, G., Elisei, F., Juphard, A., Moreaud, O.: Quantitative analysis of backchannels uttered by an interviewer during neuropsychological tests. In: Proceedings of Interspeech, pp. 2905–2909 (2016)

    Google Scholar 

  2. Beňuš, Š.: The prosody of backchannels in Slovak. In: Proceedings of 8th International Conference on Speech Prosody, pp. 75–79 (2016)

    Google Scholar 

  3. Beňuš, Š.: The prosody of backchannels in Slovak. In: Proceedings of 8th International Conference on Speech Prosody, pp. 75–79 (2016)

    Google Scholar 

  4. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman and Hall/CRC (1984)

    Google Scholar 

  5. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)

    MATH  Google Scholar 

  6. Dobrushina, N.: The semantics of interjections in reactive turns [semantika mezhdometij v reaktivnykh replikakh]. Bull. Moscow Univ. 2, 136–145 (1998). (in Russian)

    Google Scholar 

  7. Edlund, J.: In search of the conversational homunculus: serving to understand spoken human face-to-face interaction. Doctoral thesis, KTH Royal Institute of Technology (2011)

    Google Scholar 

  8. Edlund, J., Heldner, M., Moubayed, S.A., Gravano, A., Hirschberg, J.: Very short utterances in conversation. Proc. FONETIK 2010, 11–16 (2010)

    Google Scholar 

  9. Gerassimenko, O.: Functions of feedback items a, aha, and hm in Russian phone conversation [Funktsii chastits obratnoj svyazi v telefonnom dialoge (na primere leksem a, aga i hm]. Proceedings of the International Conference Dialog 1, 103–108 (2012). (in Russian)

    Google Scholar 

  10. Gravano, A., Beňuš, Š., Chávez, H., Hirschberg, J., Wilcox, L.: On the role of context in the interpretation of ‘okay’. In: Proceedings of 45th Conference of Association of Computer Linguistics, pp. 800–807 (2007)

    Google Scholar 

  11. Hara, K., Inoue, K., Takanashi, K., Kawahara, T.: Prediction of turn-taking using multitask learning with prediction of backchannels and fillers. In: Proceedings of Interspeech, pp. 991–995 (2018)

    Google Scholar 

  12. Jouvet, D., Laprie, Y.: Performance analysis of several pitch detection algorithms on simulated and real noisy speech data. In: Proceedings of 25th European Signal Processing Conference (EUSIPCO), pp. 1664–1668 (2017)

    Google Scholar 

  13. Jurafsky, D., et al.: Automatic detection of discourse structure for speech recognition and understanding. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 88–95 (1997)

    Google Scholar 

  14. Kachkovskaia, T., et al.: SibLing corpus of Russian dialogue speech designed for research on speech entrainment. In: Proceeding of LREC (2020, in press)

    Google Scholar 

  15. Kawahara, T., Yamaguchi, T., Inoue, K., Takanashi, K., Ward, N.: Prediction and generation of backchannel form for attentive listening systems. In: Proceedings of Interspeech, pp. 2890–2894 (2016)

    Google Scholar 

  16. de Kok, I., Heylen, D.: A survey on evaluation metrics for backchannel prediction models. In: Proceedings of the Interdisciplinary Workshop on Feedback Behaviors in Dialog, pp. 15–18 (2012)

    Google Scholar 

  17. Malysheva, E.: Phonetic properties of backchannels in dialogue. Bachelor’s thesis, Saint Petersburg State University (2018). (in Russian)

    Google Scholar 

  18. Müller, M., et al.: Using neural networks for data-driven backchannel prediction: a survey on input features and training techniques. In: Proceedings of International Conference on Human-Computer Interaction (2015)

    Google Scholar 

  19. Park, H.W., Gelsomini, M., Lee, J.J., Zhu, T., Breazeal, C.: Backchannel opportunity prediction for social robot listeners. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 2308–2314 (2017)

    Google Scholar 

  20. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  21. Ruede, R., Müller, M., Stüker, S., Waibel, A.: Enhancing backchannel prediction using word embeddings. In: Proceedings of Interspeech, pp. 879–883 (2017)

    Google Scholar 

  22. Ruede, R., Müller, M., Stüker, S., Waibel, A.: Yeah, right, uh-huh: a deep learning backchannel predictor. In: Eskenazi, M., Devillers, L., Mariani, J. (eds.) Advanced Social Interaction with Agents: 8th International Workshop on Spoken Dialog Systems, pp. 247–258 (2019). https://doi.org/10.1007/978-3-319-92108-2_25

  23. Talkin, D.: REAPER: Robust Epoch And Pitch EstimatoR (2015). https://github.com/google/REAPER

  24. Truong, K.P., Poppe, R., Heylen, D.: A rule-based backchannel prediction model using pitch and pause information. In: Proceedings of Interspeech, pp. 3058–3061 (2010)

    Google Scholar 

  25. Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in English and Japanese. J. Pragmat. 23, 1177–1207 (2000)

    Article  Google Scholar 

  26. Włodarczak, M., Heldner, M.: Respiratory turn-taking cues. In: Proceedings of Interspeech, pp. 1275–1279 (2016)

    Google Scholar 

Download references

Acknowledgments

The research is supported by Russian Science Foundation (Project 19-78-10046 “Phonetic manifestations of communication accommodation in dialogue”).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavel Kholiavin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kholiavin, P., Mamushina, A., Kocharov, D., Kachkovskaia, T. (2020). Automatic Detection of Backchannels in Russian Dialogue Speech. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60276-5_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60275-8

  • Online ISBN: 978-3-030-60276-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics