Skip to main content

Emotion Recognition and Sentiment Analysis of Extemporaneous Speech Transcriptions in Russian

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2020)

Abstract

Speech can be characterized by acoustical properties and semantic meaning, represented as textual speech transcriptions. Apart from the meaning content, textual information carries a substantial amount of paralinguistic information that makes it possible to detect speaker’s emotions and sentiments by means of speech transcription analysis. In this paper, we present experimental framework and results for 3-way sentiment analysis (positive, negative, and neutral) and 4-way emotion classification (happy, angry, sad, and neutral) from textual speech transcriptions in terms of Unweighted Average Recall (UAR), reaching 91.93% and 88.99%, respectively, on the multimodal corpus RAMAS containing recordings of Russian improvisational speech. Orthographic transcriptions of speech recordings from the database are obtained using available pre-trained speech recognition systems. Text vectorization is implemented using Bag-of-Words, Word2Vec, FastText and BERT methods. Investigated machine classifiers include Support Vector Machine, Random Forest, Naive Bayes and Logistic Regression. To the best of our knowledge, this is the first study of sentiment analysis and emotion recognition for both extemporaneous Russian speech and RAMAS data in particular, therefore experimental results presented in this paper can be considered as a baseline for further experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Perepelkina, O., Kazimirova, E., Konstantinova, M.: RAMAS: Russian multimodal corpus of dyadic interaction for affective computing. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 501–510. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_52

    Chapter  Google Scholar 

  2. Dvoynikova, A., Verkholyak, O., Karpov, A.: Analytical review of methods for identifying emotions in text data. CEUR-WS 2552, 8–21 (2020)

    Google Scholar 

  3. Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances In Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  4. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  5. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), vol. 1, pp. 4171–4186 (2019)

    Google Scholar 

  6. Byszuk, J., et al.: Detecting direct speech in multilingual collection of 19th-century novels. In: Proceedings of LT4HALA 2020-1st Workshop on Language Technologies for Historical and Ancient Languages, pp. 100–104 (2020)

    Google Scholar 

  7. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv: 1911.02116 (2019)

    Google Scholar 

  8. Atmaja, B.: Deep learning-based categorical and dimensional emotion recognition for written and spoken text. INA-Rxiv (2019). https://doi.org/10.31227/osf.io/fhu29

  9. Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Res. Eval. 42(4), 335–364 (2008)

    Article  Google Scholar 

  10. Perez-Rosas, V., Mihalcea, R.: Sentiment analysis of online spoken reviews. In: Interspeech, pp. 862–866 (2013)

    Google Scholar 

  11. Cummins, N., et al.: Multimodal bag-of-words for cross domains sentiment analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4954–4958. IEEE (2018)

    Google Scholar 

  12. Wollmer, M., et al.: Youtube movie reviews: sentiment analysis in an audio-visual context. IEEE Intell. Syst. 28(3), 46–53 (2013)

    Article  Google Scholar 

  13. Pereira, J.C., Luque, J., Anguera, X.: Sentiment retrieval on web reviews using spontaneous natural speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4583–4587. IEEE (2014)

    Google Scholar 

  14. Rosas, V., Mihalcea, R., Morency, L.: Multimodal sentiment analysis of spanish online videos. IEEE Intell. Syst. 28(3), 38–45 (2013)

    Article  Google Scholar 

  15. Speech Recognition (version 3.8.1). https://pypi.org/project/SpeechRecognition. Accessed 15 June 2020

  16. Yandex SpeechKit. https://cloud.yandex.ru/services/speechkit. Accessed 15 June 2020

  17. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Sebastopol (2009)

    MATH  Google Scholar 

  18. Russell, J.: Culture and the categorization of emotions. Psychol. Bull. 110(3), 426–450 (1991)

    Article  Google Scholar 

  19. RusVectores. https://rusvectores.org/ru. Accessed 15 June 2020

  20. Shavrina, T., Shapovalova, O.: To the methodology of corpus construction for machine learning: taiga syntax tree corpus and parser. In: Proceeding of CORPORA2017, International Conference, Saint-Petersburg (2017)

    Google Scholar 

Download references

Acknowledgements

This research is supported by the Russian Science Foundation (project No. 18-11-00145, research and development of the emotion recognition system), as well as by the Russian Foundation for Basic Research (project No. 18-07-01407), and by the Government of Russia (grant No. 08-08).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oxana Verkholyak .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dvoynikova, A., Verkholyak, O., Karpov, A. (2020). Emotion Recognition and Sentiment Analysis of Extemporaneous Speech Transcriptions in Russian. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60276-5_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60275-8

  • Online ISBN: 978-3-030-60276-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics