Abstract
Speech can be characterized by acoustical properties and semantic meaning, represented as textual speech transcriptions. Apart from the meaning content, textual information carries a substantial amount of paralinguistic information that makes it possible to detect speaker’s emotions and sentiments by means of speech transcription analysis. In this paper, we present experimental framework and results for 3-way sentiment analysis (positive, negative, and neutral) and 4-way emotion classification (happy, angry, sad, and neutral) from textual speech transcriptions in terms of Unweighted Average Recall (UAR), reaching 91.93% and 88.99%, respectively, on the multimodal corpus RAMAS containing recordings of Russian improvisational speech. Orthographic transcriptions of speech recordings from the database are obtained using available pre-trained speech recognition systems. Text vectorization is implemented using Bag-of-Words, Word2Vec, FastText and BERT methods. Investigated machine classifiers include Support Vector Machine, Random Forest, Naive Bayes and Logistic Regression. To the best of our knowledge, this is the first study of sentiment analysis and emotion recognition for both extemporaneous Russian speech and RAMAS data in particular, therefore experimental results presented in this paper can be considered as a baseline for further experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Perepelkina, O., Kazimirova, E., Konstantinova, M.: RAMAS: Russian multimodal corpus of dyadic interaction for affective computing. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 501–510. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_52
Dvoynikova, A., Verkholyak, O., Karpov, A.: Analytical review of methods for identifying emotions in text data. CEUR-WS 2552, 8–21 (2020)
Mikolov, T., et al.: Distributed representations of words and phrases and their compositionality. In: Advances In Neural Information Processing Systems, pp. 3111–3119 (2013)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), vol. 1, pp. 4171–4186 (2019)
Byszuk, J., et al.: Detecting direct speech in multilingual collection of 19th-century novels. In: Proceedings of LT4HALA 2020-1st Workshop on Language Technologies for Historical and Ancient Languages, pp. 100–104 (2020)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv: 1911.02116 (2019)
Atmaja, B.: Deep learning-based categorical and dimensional emotion recognition for written and spoken text. INA-Rxiv (2019). https://doi.org/10.31227/osf.io/fhu29
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Res. Eval. 42(4), 335–364 (2008)
Perez-Rosas, V., Mihalcea, R.: Sentiment analysis of online spoken reviews. In: Interspeech, pp. 862–866 (2013)
Cummins, N., et al.: Multimodal bag-of-words for cross domains sentiment analysis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4954–4958. IEEE (2018)
Wollmer, M., et al.: Youtube movie reviews: sentiment analysis in an audio-visual context. IEEE Intell. Syst. 28(3), 46–53 (2013)
Pereira, J.C., Luque, J., Anguera, X.: Sentiment retrieval on web reviews using spontaneous natural speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4583–4587. IEEE (2014)
Rosas, V., Mihalcea, R., Morency, L.: Multimodal sentiment analysis of spanish online videos. IEEE Intell. Syst. 28(3), 38–45 (2013)
Speech Recognition (version 3.8.1). https://pypi.org/project/SpeechRecognition. Accessed 15 June 2020
Yandex SpeechKit. https://cloud.yandex.ru/services/speechkit. Accessed 15 June 2020
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Sebastopol (2009)
Russell, J.: Culture and the categorization of emotions. Psychol. Bull. 110(3), 426–450 (1991)
RusVectores. https://rusvectores.org/ru. Accessed 15 June 2020
Shavrina, T., Shapovalova, O.: To the methodology of corpus construction for machine learning: taiga syntax tree corpus and parser. In: Proceeding of CORPORA2017, International Conference, Saint-Petersburg (2017)
Acknowledgements
This research is supported by the Russian Science Foundation (project No. 18-11-00145, research and development of the emotion recognition system), as well as by the Russian Foundation for Basic Research (project No. 18-07-01407), and by the Government of Russia (grant No. 08-08).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Dvoynikova, A., Verkholyak, O., Karpov, A. (2020). Emotion Recognition and Sentiment Analysis of Extemporaneous Speech Transcriptions in Russian. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-60276-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)