Context Modeling for Cross-Corpus Dimensional Acoustic Emotion Recognition: Challenges and Mixup

Fedotov, Dmitrii; Kaya, Heysem; Karpov, Alexey

doi:10.1007/978-3-319-99579-3_17

Context Modeling for Cross-Corpus Dimensional Acoustic Emotion Recognition: Challenges and Mixup

Dmitrii Fedotov¹⁶,
Heysem Kaya¹⁷ &
Alexey Karpov¹⁸

Conference paper
First Online: 25 August 2018

1483 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Abstract

Recently, focus of research in the field of affective computing was shifted to spontaneous interactions and time-continuous annotations. Such data enlarge the possibility for real-world emotion recognition in the wild, but also introduce new challenges. Affective computing is a research area, where data collection is not a trivial and cheap task; therefore it would be rational to use all the data available. However, due to the subjective nature of emotions, differences in cultural and linguistic features as well as environmental conditions, combining affective speech data is not a straightforward process. In this paper, we analyze difficulties of automatic emotion recognition in time-continuous, dimensional scenario using data from RECOLA, SEMAINE and CreativeIT databases. We propose to employ a simple but effective strategy called “mixup” to overcome the gap in feature-target and target-target covariance structures across corpora. We showcase the performance of our system in three different cross-corpus experimental setups: single-corpus training, two-corpora training and training on augmented (mixed up) data. Findings show that the prediction behavior of trained models heavily depends on the covariance structure of the training corpus, and mixup is very effective in improving cross-corpus acoustic emotion recognition performance of context dependent LSTM models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)
Google Scholar
Chollet, F., et al.: Keras (2015). https://keras.io
Cowie, R., Douglas-Cowie, E., Savvidou*, S., McMahon, E., Sawey, M., Schröder, M.: ‘FEELTRACE’: An instrument for recording perceived emotion in real time. In: ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion (2000)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the 18th ACM International Conference on Multimedia, pp. 1459–1462. ACM (2010)
Google Scholar
Fedotov, D., Ivanko, D., Sidorov, M., Minker, W.: Contextual dependencies in time-continuous multidimensional affect recognition. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC) (2018)
Google Scholar
Gunes, H., Pantic, M.: Automatic, dimensional and continuous emotion recognition. Int. J. Synth. Emotions 1(1), 68–99 (2010)
Article Google Scholar
Haq, S., Jackson, P.J.: Multimodal emotion recognition. Machine audition: principles, algorithms and systems, pp. 398–423 (2010)
Google Scholar
Kaya, H., Fedotov, D., Yeşilkanat, A., Verkholyak, O., Zhang, Y., Karpov, A.: LSTM based cross-corpus and cross-task acoustic emotion recognition. In: INTERSPEECH 2018. ISCA (2018, in press)
Google Scholar
Kaya, H., Karpov, A.A.: Efficient and effective strategies for cross-corpus acoustic emotion recognition. Neurocomputing 275, 1028–1034 (2018)
Article Google Scholar
Lim, N.: Cultural differences in emotion: differences in emotional arousal level between the east and the west. Integr. Med. Res. 5(2), 105–109 (2016)
Article Google Scholar
Makarova, V., Petrushin, V.A.: RUSLANA: A database of Russian emotional utterances. In: Seventh International Conference on Spoken Language Processing (2002)
Google Scholar
Mariooryad, S., Busso, C.: Analysis and compensation of the reaction lag of evaluators in continuous emotional annotations. In: Affective Computing and Intelligent Interaction (ACII), pp. 85–90. IEEE (2013)
Google Scholar
McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2012)
Article Google Scholar
Metallinou, A., Lee, C.C., Busso, C., Carnicke, S., Narayanan, S.: The USC CreativeIT database: a multimodal database of theatrical improvisation. In: Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, p. 55 (2010)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)
Google Scholar
Nicolaou, M.A., Gunes, H., Pantic, M.: Automatic segmentation of spontaneous data using dimensional labels from multiple coders. In: Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality. German Research Center for AI (DFKI) (2010)
Google Scholar
Nicolle, J., Rapp, V., Bailly, K., Prevost, L., Chetouani, M.: Robust continuous prediction of human emotions using multiscale dynamic cues. In: Proceedings of the ACM International Conference on Multimodal Interaction, pp. 501–508 (2012)
Google Scholar
Paltoglou, G., Thelwall, M.: Seeing stars of valence and arousal in blog posts. IEEE Trans. Affect. Comput. 4(1), 116–123 (2013)
Article Google Scholar
Petta, P., Pelachaud, C., Cowie, R.: Emotion-Oriented Systems: The HUM-AINE Handbook. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-15184-2
Book Google Scholar
Ringeval, F., Sonderegger, A., Sauer, J., Lalanne, D.: Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–8. IEEE (2013)
Google Scholar
Schuller, B., Steidl, S., Batliner, A., Epps, J., Eyben, F., Ringeval, F., Marchi, E., Zhang, Y.: The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Schuller, B., Vlasenko, B., Eyben, F., Wollmer, M., Stuhlsatz, A., Wendemuth, A., Rigoll, G.: Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans. Affect. Comput. 1(2), 119–131 (2010)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Valstar, M., Gratch, J., Schuller, B., Ringeval, F., Lalanne, D., Torres Torres, M., Scherer, S., Stratou, G., Cowie, R., Pantic, M.: Avec 2016: Depression, mood, and emotion recognition workshop and challenge. In: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, pp. 3–10. ACM (2016)
Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)

Download references

Acknowledgments

This research is supported by the Russian Science Foundation (project No. 18-11-00145).

Author information

Authors and Affiliations

Institute of Communications Engineering, Ulm University, Ulm, Germany
Dmitrii Fedotov
Department of Computer Engineering, Tekirdağ Namık Kemal University, Çorlu, Turkey
Heysem Kaya
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov

Authors

Dmitrii Fedotov
View author publications
You can also search for this author in PubMed Google Scholar
Heysem Kaya
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Karpov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dmitrii Fedotov .

Editor information

Editors and Affiliations

SPIIRAS, St. Petersburg, Russia
Alexey Karpov
Leipzig University of Telecommunications, Leipzig, Germany
Oliver Jokisch
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fedotov, D., Kaya, H., Karpov, A. (2018). Context Modeling for Cross-Corpus Dimensional Acoustic Emotion Recognition: Challenges and Mixup. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-99579-3_17
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics