Abstract
This paper deals with acoustic properties of backchannels – those turns within a dialogue which do not convey information but signify that the speaker is listening to his/her interlocutor (uh-huh, hm etc.). The research is based on a Russian corpus of dialogue speech, SibLing, a part of which (339 min of speech) was manually segmented into backchannels and non-backchannels. Then, a number of acoustic parameters was calculated: duration, intensity, fundamental frequency, and pause duration. Our data have shown that in Russian speech backchannels are shorter and have lower loudness and pitch than non-backchannels. After that, two classifiers were tested: CART and SVM. The highest efficiency was achieved using SVM (F\(_1\) = 0.651) and the following feature set: duration, maximum fundamental frequency, melodic slope. The most valuable feature was duration.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bailly, G., Elisei, F., Juphard, A., Moreaud, O.: Quantitative analysis of backchannels uttered by an interviewer during neuropsychological tests. In: Proceedings of Interspeech, pp. 2905–2909 (2016)
Beňuš, Š.: The prosody of backchannels in Slovak. In: Proceedings of 8th International Conference on Speech Prosody, pp. 75–79 (2016)
Beňuš, Š.: The prosody of backchannels in Slovak. In: Proceedings of 8th International Conference on Speech Prosody, pp. 75–79 (2016)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman and Hall/CRC (1984)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Dobrushina, N.: The semantics of interjections in reactive turns [semantika mezhdometij v reaktivnykh replikakh]. Bull. Moscow Univ. 2, 136–145 (1998). (in Russian)
Edlund, J.: In search of the conversational homunculus: serving to understand spoken human face-to-face interaction. Doctoral thesis, KTH Royal Institute of Technology (2011)
Edlund, J., Heldner, M., Moubayed, S.A., Gravano, A., Hirschberg, J.: Very short utterances in conversation. Proc. FONETIK 2010, 11–16 (2010)
Gerassimenko, O.: Functions of feedback items a, aha, and hm in Russian phone conversation [Funktsii chastits obratnoj svyazi v telefonnom dialoge (na primere leksem a, aga i hm]. Proceedings of the International Conference Dialog 1, 103–108 (2012). (in Russian)
Gravano, A., Beňuš, Š., Chávez, H., Hirschberg, J., Wilcox, L.: On the role of context in the interpretation of ‘okay’. In: Proceedings of 45th Conference of Association of Computer Linguistics, pp. 800–807 (2007)
Hara, K., Inoue, K., Takanashi, K., Kawahara, T.: Prediction of turn-taking using multitask learning with prediction of backchannels and fillers. In: Proceedings of Interspeech, pp. 991–995 (2018)
Jouvet, D., Laprie, Y.: Performance analysis of several pitch detection algorithms on simulated and real noisy speech data. In: Proceedings of 25th European Signal Processing Conference (EUSIPCO), pp. 1664–1668 (2017)
Jurafsky, D., et al.: Automatic detection of discourse structure for speech recognition and understanding. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 88–95 (1997)
Kachkovskaia, T., et al.: SibLing corpus of Russian dialogue speech designed for research on speech entrainment. In: Proceeding of LREC (2020, in press)
Kawahara, T., Yamaguchi, T., Inoue, K., Takanashi, K., Ward, N.: Prediction and generation of backchannel form for attentive listening systems. In: Proceedings of Interspeech, pp. 2890–2894 (2016)
de Kok, I., Heylen, D.: A survey on evaluation metrics for backchannel prediction models. In: Proceedings of the Interdisciplinary Workshop on Feedback Behaviors in Dialog, pp. 15–18 (2012)
Malysheva, E.: Phonetic properties of backchannels in dialogue. Bachelor’s thesis, Saint Petersburg State University (2018). (in Russian)
Müller, M., et al.: Using neural networks for data-driven backchannel prediction: a survey on input features and training techniques. In: Proceedings of International Conference on Human-Computer Interaction (2015)
Park, H.W., Gelsomini, M., Lee, J.J., Zhu, T., Breazeal, C.: Backchannel opportunity prediction for social robot listeners. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 2308–2314 (2017)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Ruede, R., Müller, M., Stüker, S., Waibel, A.: Enhancing backchannel prediction using word embeddings. In: Proceedings of Interspeech, pp. 879–883 (2017)
Ruede, R., Müller, M., Stüker, S., Waibel, A.: Yeah, right, uh-huh: a deep learning backchannel predictor. In: Eskenazi, M., Devillers, L., Mariani, J. (eds.) Advanced Social Interaction with Agents: 8th International Workshop on Spoken Dialog Systems, pp. 247–258 (2019). https://doi.org/10.1007/978-3-319-92108-2_25
Talkin, D.: REAPER: Robust Epoch And Pitch EstimatoR (2015). https://github.com/google/REAPER
Truong, K.P., Poppe, R., Heylen, D.: A rule-based backchannel prediction model using pitch and pause information. In: Proceedings of Interspeech, pp. 3058–3061 (2010)
Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in English and Japanese. J. Pragmat. 23, 1177–1207 (2000)
Włodarczak, M., Heldner, M.: Respiratory turn-taking cues. In: Proceedings of Interspeech, pp. 1275–1279 (2016)
Acknowledgments
The research is supported by Russian Science Foundation (Project 19-78-10046 “Phonetic manifestations of communication accommodation in dialogue”).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Kholiavin, P., Mamushina, A., Kocharov, D., Kachkovskaia, T. (2020). Automatic Detection of Backchannels in Russian Dialogue Speech. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-60276-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)