Emotion Recognition and Sentiment Analysis of Extemporaneous Speech Transcriptions in Russian

Dvoynikova, Anastasia; Verkholyak, Oxana; Karpov, Alexey

doi:10.1007/978-3-030-60276-5_14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

International Conference on Speech and Computer

1705 Accesses
7 Citations

Abstract

Speech can be characterized by acoustical properties and semantic meaning, represented as textual speech transcriptions. Apart from the meaning content, textual information carries a substantial amount of paralinguistic information that makes it possible to detect speaker’s emotions and sentiments by means of speech transcription analysis. In this paper, we present experimental framework and results for 3-way sentiment analysis (positive, negative, and neutral) and 4-way emotion classification (happy, angry, sad, and neutral) from textual speech transcriptions in terms of Unweighted Average Recall (UAR), reaching 91.93% and 88.99%, respectively, on the multimodal corpus RAMAS containing recordings of Russian improvisational speech. Orthographic transcriptions of speech recordings from the database are obtained using available pre-trained speech recognition systems. Text vectorization is implemented using Bag-of-Words, Word2Vec, FastText and BERT methods. Investigated machine classifiers include Support Vector Machine, Random Forest, Naive Bayes and Logistic Regression. To the best of our knowledge, this is the first study of sentiment analysis and emotion recognition for both extemporaneous Russian speech and RAMAS data in particular, therefore experimental results presented in this paper can be considered as a baseline for further experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Perepelkina, O., Kazimirova, E., Konstantinova, M.: RAMAS: Russian multimodal corpus of dyadic interaction for affective computing. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 501–510. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_52
Chapter Google Scholar
Dvoynikova, A., Verkholyak, O., Karpov, A.: Analytical review of methods for identifying emotions in text data. CEUR-WS 2552, 8–21 (2020)
Google Scholar
Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances In Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), vol. 1, pp. 4171–4186 (2019)
Google Scholar
Byszuk, J., et al.: Detecting direct speech in multilingual collection of 19th-century novels. In: Proceedings of LT4HALA 2020-1st Workshop on Language Technologies for Historical and Ancient Languages, pp. 100–104 (2020)
Google Scholar
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv: 1911.02116 (2019)
Google Scholar
Atmaja, B.: Deep learning-based categorical and dimensional emotion recognition for written and spoken text. INA-Rxiv (2019). https://doi.org/10.31227/osf.io/fhu29
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Res. Eval. 42(4), 335–364 (2008)
Article Google Scholar
Perez-Rosas, V., Mihalcea, R.: Sentiment analysis of online spoken reviews. In: Interspeech, pp. 862–866 (2013)
Google Scholar
Cummins, N., et al.: Multimodal bag-of-words for cross domains sentiment analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4954–4958. IEEE (2018)
Google Scholar
Wollmer, M., et al.: Youtube movie reviews: sentiment analysis in an audio-visual context. IEEE Intell. Syst. 28(3), 46–53 (2013)
Article Google Scholar
Pereira, J.C., Luque, J., Anguera, X.: Sentiment retrieval on web reviews using spontaneous natural speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4583–4587. IEEE (2014)
Google Scholar
Rosas, V., Mihalcea, R., Morency, L.: Multimodal sentiment analysis of spanish online videos. IEEE Intell. Syst. 28(3), 38–45 (2013)
Article Google Scholar
Speech Recognition (version 3.8.1). https://pypi.org/project/SpeechRecognition. Accessed 15 June 2020
Yandex SpeechKit. https://cloud.yandex.ru/services/speechkit. Accessed 15 June 2020
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Sebastopol (2009)
MATH Google Scholar
Russell, J.: Culture and the categorization of emotions. Psychol. Bull. 110(3), 426–450 (1991)
Article Google Scholar
RusVectores. https://rusvectores.org/ru. Accessed 15 June 2020
Shavrina, T., Shapovalova, O.: To the methodology of corpus construction for machine learning: taiga syntax tree corpus and parser. In: Proceeding of CORPORA2017, International Conference, Saint-Petersburg (2017)
Google Scholar

Download references

Acknowledgements

This research is supported by the Russian Science Foundation (project No. 18-11-00145, research and development of the emotion recognition system), as well as by the Russian Foundation for Basic Research (project No. 18-07-01407), and by the Government of Russia (grant No. 08-08).

Author information

Authors and Affiliations

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences SPIIRAS, St. Petersburg, Russia
Anastasia Dvoynikova, Oxana Verkholyak & Alexey Karpov
ITMO University, St. Petersburg, Russia
Anastasia Dvoynikova, Oxana Verkholyak & Alexey Karpov

Authors

Anastasia Dvoynikova
View author publications
You can also search for this author in PubMed Google Scholar
Oxana Verkholyak
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Karpov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oxana Verkholyak .

Editor information

Editors and Affiliations

St. Petersburg Institute for Informatics and Automation, Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Institute for Applied and Mathematical Linguistics, Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dvoynikova, A., Verkholyak, O., Karpov, A. (2020). Emotion Recognition and Sentiment Analysis of Extemporaneous Speech Transcriptions in Russian. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-60276-5_14
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics