Automatic Detection of Backchannels in Russian Dialogue Speech

Kholiavin, Pavel; Mamushina, Anna; Kocharov, Daniil; Kachkovskaia, Tatiana

doi:10.1007/978-3-030-60276-5_21

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

International Conference on Speech and Computer

1596 Accesses
2 Citations

Abstract

This paper deals with acoustic properties of backchannels – those turns within a dialogue which do not convey information but signify that the speaker is listening to his/her interlocutor (uh-huh, hm etc.). The research is based on a Russian corpus of dialogue speech, SibLing, a part of which (339 min of speech) was manually segmented into backchannels and non-backchannels. Then, a number of acoustic parameters was calculated: duration, intensity, fundamental frequency, and pause duration. Our data have shown that in Russian speech backchannels are shorter and have lower loudness and pitch than non-backchannels. After that, two classifiers were tested: CART and SVM. The highest efficiency was achieved using SVM (F\(_1\) = 0.651) and the following feature set: duration, maximum fundamental frequency, melodic slope. The most valuable feature was duration.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic Detection of Prosodic Boundaries in Brazilian Portuguese Spontaneous Speech

Combining Syntactic and Acoustic Features for Prosodic Boundary Detection in Russian

Prosodic word boundary detection from Bengali continuous speech

Article 13 November 2019

References

Bailly, G., Elisei, F., Juphard, A., Moreaud, O.: Quantitative analysis of backchannels uttered by an interviewer during neuropsychological tests. In: Proceedings of Interspeech, pp. 2905–2909 (2016)
Google Scholar
Beňuš, Š.: The prosody of backchannels in Slovak. In: Proceedings of 8th International Conference on Speech Prosody, pp. 75–79 (2016)
Google Scholar
Beňuš, Š.: The prosody of backchannels in Slovak. In: Proceedings of 8th International Conference on Speech Prosody, pp. 75–79 (2016)
Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman and Hall/CRC (1984)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Dobrushina, N.: The semantics of interjections in reactive turns [semantika mezhdometij v reaktivnykh replikakh]. Bull. Moscow Univ. 2, 136–145 (1998). (in Russian)
Google Scholar
Edlund, J.: In search of the conversational homunculus: serving to understand spoken human face-to-face interaction. Doctoral thesis, KTH Royal Institute of Technology (2011)
Google Scholar
Edlund, J., Heldner, M., Moubayed, S.A., Gravano, A., Hirschberg, J.: Very short utterances in conversation. Proc. FONETIK 2010, 11–16 (2010)
Google Scholar
Gerassimenko, O.: Functions of feedback items a, aha, and hm in Russian phone conversation [Funktsii chastits obratnoj svyazi v telefonnom dialoge (na primere leksem a, aga i hm]. Proceedings of the International Conference Dialog 1, 103–108 (2012). (in Russian)
Google Scholar
Gravano, A., Beňuš, Š., Chávez, H., Hirschberg, J., Wilcox, L.: On the role of context in the interpretation of ‘okay’. In: Proceedings of 45th Conference of Association of Computer Linguistics, pp. 800–807 (2007)
Google Scholar
Hara, K., Inoue, K., Takanashi, K., Kawahara, T.: Prediction of turn-taking using multitask learning with prediction of backchannels and fillers. In: Proceedings of Interspeech, pp. 991–995 (2018)
Google Scholar
Jouvet, D., Laprie, Y.: Performance analysis of several pitch detection algorithms on simulated and real noisy speech data. In: Proceedings of 25th European Signal Processing Conference (EUSIPCO), pp. 1664–1668 (2017)
Google Scholar
Jurafsky, D., et al.: Automatic detection of discourse structure for speech recognition and understanding. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 88–95 (1997)
Google Scholar
Kachkovskaia, T., et al.: SibLing corpus of Russian dialogue speech designed for research on speech entrainment. In: Proceeding of LREC (2020, in press)
Google Scholar
Kawahara, T., Yamaguchi, T., Inoue, K., Takanashi, K., Ward, N.: Prediction and generation of backchannel form for attentive listening systems. In: Proceedings of Interspeech, pp. 2890–2894 (2016)
Google Scholar
de Kok, I., Heylen, D.: A survey on evaluation metrics for backchannel prediction models. In: Proceedings of the Interdisciplinary Workshop on Feedback Behaviors in Dialog, pp. 15–18 (2012)
Google Scholar
Malysheva, E.: Phonetic properties of backchannels in dialogue. Bachelor’s thesis, Saint Petersburg State University (2018). (in Russian)
Google Scholar
Müller, M., et al.: Using neural networks for data-driven backchannel prediction: a survey on input features and training techniques. In: Proceedings of International Conference on Human-Computer Interaction (2015)
Google Scholar
Park, H.W., Gelsomini, M., Lee, J.J., Zhu, T., Breazeal, C.: Backchannel opportunity prediction for social robot listeners. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 2308–2314 (2017)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ruede, R., Müller, M., Stüker, S., Waibel, A.: Enhancing backchannel prediction using word embeddings. In: Proceedings of Interspeech, pp. 879–883 (2017)
Google Scholar
Ruede, R., Müller, M., Stüker, S., Waibel, A.: Yeah, right, uh-huh: a deep learning backchannel predictor. In: Eskenazi, M., Devillers, L., Mariani, J. (eds.) Advanced Social Interaction with Agents: 8th International Workshop on Spoken Dialog Systems, pp. 247–258 (2019). https://doi.org/10.1007/978-3-319-92108-2_25
Talkin, D.: REAPER: Robust Epoch And Pitch EstimatoR (2015). https://github.com/google/REAPER
Truong, K.P., Poppe, R., Heylen, D.: A rule-based backchannel prediction model using pitch and pause information. In: Proceedings of Interspeech, pp. 3058–3061 (2010)
Google Scholar
Ward, N., Tsukahara, W.: Prosodic features which cue back-channel responses in English and Japanese. J. Pragmat. 23, 1177–1207 (2000)
Article Google Scholar
Włodarczak, M., Heldner, M.: Respiratory turn-taking cues. In: Proceedings of Interspeech, pp. 1275–1279 (2016)
Google Scholar

Download references

Acknowledgments

The research is supported by Russian Science Foundation (Project 19-78-10046 “Phonetic manifestations of communication accommodation in dialogue”).

Author information

Authors and Affiliations

Saint Petersburg State University, Universitetskaya emb., 11, St. Petersburg, Russia
Pavel Kholiavin, Anna Mamushina, Daniil Kocharov & Tatiana Kachkovskaia

Authors

Pavel Kholiavin
View author publications
You can also search for this author in PubMed Google Scholar
Anna Mamushina
View author publications
You can also search for this author in PubMed Google Scholar
Daniil Kocharov
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Kachkovskaia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavel Kholiavin .

Editor information

Editors and Affiliations

St. Petersburg Institute for Informatics and Automation, Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Institute for Applied and Mathematical Linguistics, Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kholiavin, P., Mamushina, A., Kocharov, D., Kachkovskaia, T. (2020). Automatic Detection of Backchannels in Russian Dialogue Speech. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-60276-5_21
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Detection of Backchannels in Russian Dialogue Speech

Abstract

Access this chapter

Similar content being viewed by others

Automatic Detection of Prosodic Boundaries in Brazilian Portuguese Spontaneous Speech

Combining Syntactic and Acoustic Features for Prosodic Boundary Detection in Russian

Prosodic word boundary detection from Bengali continuous speech

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Detection of Backchannels in Russian Dialogue Speech

Abstract

Access this chapter

Similar content being viewed by others

Automatic Detection of Prosodic Boundaries in Brazilian Portuguese Spontaneous Speech

Combining Syntactic and Acoustic Features for Prosodic Boundary Detection in Russian

Prosodic word boundary detection from Bengali continuous speech

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation