Learning an Unsupervised and Interpretable Representation of Emotion from Speech

Wang, Siwei; Soladié, Catherine; Séguier, Renaud

doi:10.1007/978-3-030-60276-5_61

Siwei Wang¹⁰,
Catherine Soladié¹⁰ &
Renaud Séguier¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

International Conference on Speech and Computer

1599 Accesses

Abstract

One of the severe obstacles to naturalistic human affective computing is that emotions are complex constructs with fuzzy boundaries and substantial individual variations. Thus, an important issue to be considered in emotion analysis is generating a person-specific representation of emotion in an unsupervised manner. This paper presents a fully unsupervised method combining autoencoder with Principle Component Analysis to build an emotion representation from speech signals. As each person has a different way of expressing emotions, this method is applied to the subject level. We also investigate the relevancy of such a representation. Experiments on Emo-DB, IEMOCAP, and SEMAINE database show that the proposed representation of emotion is invariant among subjects and similar to the representation built by psychologists, especially on the arousal dimension.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.F., Weiss, B.: A database of German emotional speech. In: Ninth European Conference on Speech Communication and Technology (2005)
Google Scholar
Busso, C., et al.: IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42(4), 335 (2008)
Article Google Scholar
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 132–149 (2018)
Google Scholar
Daniel, W.W.: Applied Nonparametric Statistics. Houghton Mifflin, Boston (1978)
MATH Google Scholar
Eskimez, S.E., Duan, Z., Heinzelman, W.: Unsupervised learning approach to feature analysis for automatic speech emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5099–5103 (2018)
Google Scholar
Eyben, F., Weninger, F., Gross, F., Schuller, B.: Recent developments in openSMILE, the munich open-source multimedia feature extractor. In: Proceedings of the 21st ACM international conference on Multimedia, pp. 835–838 (2013)
Google Scholar
Ghosh, S., Laksana, E., Morency, L.P., Scherer, S.: Representation learning for speech emotion recognition. In: Interspeech, pp. 3603–3607 (2016)
Google Scholar
Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Kaya, H., Karpov, A.A., Salah, A.A.: Fisher vectors with cascaded normalization for paralinguistic analysis. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Kim, Y., Provost, E.M.: Emotion classification via utterance-level dynamics: a pattern-based approach to characterizing affective expressions. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3677–3681 (2013)
Google Scholar
Latif, S., Rana, R., Qadir, J., Epps, J.: Variational autoencoders for learning latent representations of speech emotion: a preliminary study. In: Interspeech, International Speech Communication Association (ISCA), pp. 3107–3111 (2018)
Google Scholar
Lotfian, R., Busso, C.: Curriculum learning for speech emotion recognition from crowdsourced labels. IEEE/ACM Trans. Audio, Speech Lang. Process. 27(4), 815–826 (2019)
Article Google Scholar
McKeown, G., Valstar, M., Cowie, R., Pantic, M., Schroder, M.: The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans. Affect. Comput. 3(1), 5–17 (2011)
Article Google Scholar
Op’t Eynde, P., De Corte, E., Verschaffel, L.: Accepting emotional complexity: a socio-constructivist perspective on the role of emotions in the mathematics classroom. Educ. Stud. Math. 63(2), 193–207 (2006)
Article Google Scholar
Pearson, K.: LIII on lines and planes of closest fit to systems of points in space. London Edinb. Dublin Philos. Mag. J. Sci. 2(11), 559–572 (1901)
Google Scholar
Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: from unimodal analysis to multimodal fusion. Inform. Fusion 37, 98–125 (2017)
Article Google Scholar
Russell, J.A.: A circumplex model of affect. J. Pers. Soc. Psychol. 39(6), 1161 (1980)
Article Google Scholar
Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.R.: Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28(11), 2660–2673 (2016)
Article MathSciNet Google Scholar
Schuller, B., Rigoll, G., Lang, M.: Hidden Markov model-based speech emotion recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. II-1 (2003)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The interspeech 2009 emotion challenge. In: Tenth Annual Conference of the International Speech Communication Association (2009)
Google Scholar
Soladié, C., Stoiber, N., Séguier, R.: Invariant representation of facial expressions for blended expression recognition on unknown subjects. Comput. Vis. Image Underst. 117(11), 1598–1609 (2013)
Article Google Scholar
Wang, S., Soladié, C., Séguier, R.: OCAE: Organization-controlled autoencoder for unsupervised speech emotion analysis. In: 5th International Conference on Frontiers of Signal Processing (ICFSP), pp. 72–76. IEEE (2019)
Google Scholar
Wu, S., Falk, T.H., Chan, W.Y.: Automatic speech emotion recognition using modulation spectral features. Speech Commun. 53(5), 768–785 (2011)
Article Google Scholar
Zhao, S., Ding, G., Han, J., Gao, Y.: Personality-aware personalized emotion recognition from physiological signals. In: IJCAI, pp. 1660–1667 (2018)
Google Scholar

Download references

Acknowledgments.

Thanks to China Scholarship Council (CSC) and the French government funding program ANR REFLET No. ANR-17-CE19-0020-01 for funding.

Author information

Authors and Affiliations

FAST Research Team, CentraleSupélec, IETR, 6164, Rennes, France
Siwei Wang, Catherine Soladié & Renaud Séguier

Authors

Siwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Soladié
View author publications
You can also search for this author in PubMed Google Scholar
Renaud Séguier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siwei Wang .

Editor information

Editors and Affiliations

St. Petersburg Institute for Informatics and Automation, Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Institute for Applied and Mathematical Linguistics, Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Soladié, C., Séguier, R. (2020). Learning an Unsupervised and Interpretable Representation of Emotion from Speech. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_61

Download citation

DOI: https://doi.org/10.1007/978-3-030-60276-5_61
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60275-8
Online ISBN: 978-3-030-60276-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics