Skip to main content

Deep Learning Our Everyday Emotions

A Short Overview

  • Chapter

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 37))

Abstract

Emotion is omnipresent in our daily lives and has a significant influence on our functional activities. Thus, computer-based recognising and monitoring of affective cues can be of interest such as when interacting with intelligent systems, or for health-care and security reasons. In this light, this short overview focuses on audio/visual and textual cues as input feature modality for automatic emotion recognition. In particular, it shows how these can best be modelled in a Neural Network context. This includes deep learning, and sparse auto-encoders for transfer learning of a compact task and population representation. It further shows avenues towards massively autonomous rich multitask-learning and required confidence estimation as is needed to prepare such technology for real-life application.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amer, M.R., Siddiquie, B., Richey, C., Divakaran, A.: Emotion Detection in Speech using Deep Networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), Florence, Italy. IEEE (2013)

    Google Scholar 

  2. Bennett, I.: Emotion detection device and method for use in distributed systems. US Patent 8,214,214 (July 3, 2012)

    Google Scholar 

  3. Brückner, R., Schuller, B.: Likability Classification – A not so Deep Neural Network Approach. In: Proceedings of the INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, ISCA, Portland, OR, 4 pages (September 2012)

    Google Scholar 

  4. Brückner, R., Schuller, B.: Hierarchical Neural Networks and Enhanced Class Posteriors for Social Signal Classification. In: Proceedings 13th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2013, Olomouc, Czech Republic, 6 pages. IEEE (December 2013)

    Google Scholar 

  5. Brückner, R., Schuller, B.: Being at Odds? – Deep and Hierarchical Neural Networks for Classification and Regression of Conflict in Speech. In: Poggi, I., D’Errico, F., Vinciarelli, A. (eds.) Conflict and Negotiation: Social Research and Machine Intelligence. Computational Social Sciences. Springer, Heidelberg (2014)

    Google Scholar 

  6. Brückner, R., Schuller, B.: Social Signal Classification Using Deep BLSTM Recurrent Neural Networks. In: Proceedings of the 39th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, Italy, pp. 4856–4860. IEEE (May 2014)

    Google Scholar 

  7. Cibau, N.E., Albornoz, E.M., Rufiner, H.L.: Speech emotion recognition using a deep autoencoder. In: Proceedings of the XV Reunión de Trabajo en Procesamiento de la Información y Control (RPIC 2013), San Carlos de Bariloche (2013)

    Google Scholar 

  8. Coutinho, E., Deng, J., Schuller, B.: Transfer Learning Emotion Manifestation Across Music and Speech. In: Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN) as part of the IEEE World Congress on Computational Intelligence (IEEE WCCI), Beijing, China, p. 6. IEEE (July 2014)

    Google Scholar 

  9. Deng, J., Han, W., Schuller, B.: Confidence Measures for Speech Emotion Recognition: a Start. In: Fingscheidt, T., Kellermann, W. (eds.) Proceedings of Speech Communication; 10. ITG Symposium, Braunschweig, Germany, pp. 1–4. ITG, IEEE (2012)

    Google Scholar 

  10. Deng, J., Schuller, B.: Confidence Measures in Speech Emotion Recognition Based on Semi-supervised Learning. In: Proceedings of the INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, ISCA, Portland, OR, 4 pages. ISCA (September 2012)

    Google Scholar 

  11. Deng, J., Xia, R., Zhang, Z., Liu, Y., Schuller, B.: Introducing Shared-Hidden-Layer Autoencoders for Transfer Learning and their Application in Acoustic Emotion Recognition. In: Proceedings 39th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, Italy, May 2014, pp. 4851–4855. IEEE (2014)

    Google Scholar 

  12. Deng, J., Zhang, Z., Marchi, E., Schuller, B.: Sparse Autoencoder-based Feature Transfer Learning for Speech Emotion Recognition. In: Proc. 5th Biannual Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII 2013), Geneva, Switzerland, pp. 511–516. HUMAINE Association, IEEE (2013)

    Google Scholar 

  13. Deng, J., Zhang, Z., Schuller, B.: Linked Source and Target Domain Subspace Feature Transfer Learning – Exemplified by Speech Emotion Recognition. In: Proceedings 22nd International Conference on Pattern Recognition (ICPR 2014), Stockholm, Sweden, pp. 761–766. IAPR (August 2014)

    Google Scholar 

  14. Erhan, D., Bengio, Y., Courville, A., Vincent, P.-A.M.P., Bengio, S.: Why Does Unsupervised Pre-training Help Deep Learning? The Journal of Machine Learning Research 11, 625–660 (2010)

    MATH  MathSciNet  Google Scholar 

  15. Esparza, J., Scherer, S., Schwenker, F.: Studying Self- and Active-Training Methods for Multi-feature Set Emotion Recognition. In: Schwenker, F., Trentin, E. (eds.) PSL 2011. LNCS, vol. 7081, pp. 19–31. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Eyben, F., Wöllmer, M., Schuller, B.: A Multi-Task Approach to Continuous Five-Dimensional Affect Sensing in Natural Speech. ACM Transactions on Interactive Intelligent Systems, Special Issue on Affective Interaction in Natural Environments 2(1), 29 (2012)

    Google Scholar 

  17. Han, W., Li, H., Ruan, H., Ma, L., Sun, J., Schuller, B.: Active Learning for Dimensional Speech Emotion Recognition. In: Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, pp. 2856–2859. ISCA (August 2013)

    Google Scholar 

  18. Han, W., Zhang, Z., Deng, J., Wöllmer, M., Weninger, F., Schuller, B.: Towards Distributed Recognition of Emotion in Speech. In: Proceedings 5th International Symposium on Communications, Control, and Signal Processing, ISCCSP 2012, Rome, Italy, pp. 1–4. IEEE (May 2012)

    Google Scholar 

  19. Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580 (2012)

    Google Scholar 

  20. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  21. Huang, C., Gong, W., Fu, W., Feng, D.: A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM. Mathematical Problems in Engineering, Article ID 749604, 7 (2014)

    Google Scholar 

  22. Jirayucharoensak, S., Pan-Ngum, S., Israsena, P.: EEG-Based Emotion Recognition Using Deep Learning Network with Principal Component Based Covariate Shift Adaptation. The Scientific World Journal, Article ID 627892, 10 (2014)

    Google Scholar 

  23. Kahou, S.E., Pal, C., Bouthillier, X., Froumenty, Gülcehre, P., Memisevic, R., Vincent, P., Courville, A., Bengio, Y.: Combining Modality Specific Deep Neural Networks for Emotion Recognition in Video. In: Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI 2013), Sydney, Australia, pp. 543–550. ACM (2013)

    Google Scholar 

  24. Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audio-visual emotion recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013), Vancouver, Canada. IEEE (2013)

    Google Scholar 

  25. Le, D., Provost, E.: Emotion recognition from spontaneous speech using Hidden Markov models with deep belief networks. In: 2013 IEEE Workshop on Proceedings Automatic Speech Recognition and Understanding (ASRU), pp. 216–221. IEEE, Olomouc (2013)

    Chapter  Google Scholar 

  26. Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., Gonzalez, I., Valentin, E., Sahli, H.: Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition. In: Proceedings Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII 2013). IEEE, Geneva (2013)

    Google Scholar 

  27. Maas, A., Hannun, A., Ng, A.: Rectifier Nonlinearities Improve Neural Network Acoustic Models. In: Proc. of ICML Workshop on Deep Learning for Audio, Speech, and Language Processing, WDLASL, Atlanta, GA, USA (June 2013)

    Google Scholar 

  28. Popović, B., Ostrogonac, S., Delić, V., Janev, M., Stanković, I.: Deep Architectures for Automatic Emotion Recognition Based on Lip Shape. Infotech-Jahorina 12, 939–943 (2013)

    Google Scholar 

  29. Sánchez-Gutiérrez, M.E., Albornoz, E.M., Martinez-Licona, F., Rufiner, H.L., Goddard, J.: Deep Learning for Emotional Speech Recognition. In: Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-Lopez, J.A., Salas-Rodríguez, J., Suen, C.Y. (eds.) MCPR 2014. LNCS, vol. 8495, pp. 311–320. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  30. Schmidhuber, J.: Deep Learning in Neural Networks: An Overview. Technical Report IDSIA-03-14, IDSIA, Lugano, Switzerland (2014)

    Google Scholar 

  31. Schmidt, E.M., Kim, Y.E.: Learning Emotion-based Acoustic Features with Deep Belief Networks. In: Proceedings 2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, pp. 65–68. IEEE (2011)

    Google Scholar 

  32. Stuhlsatz, A., Meyer, C., Eyben, F., Zielke, T., Meier, G., Schuller, B.: Deep Neural Networks for Acoustic Emotion Recognition: Raising the Benchmarks. In: Proceedings 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, pp. 5688–5691. IEEE, Prague (2011)

    Chapter  Google Scholar 

  33. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proc. of ICML, New York, NY, USA, pp. 1096–1103 (2008)

    Google Scholar 

  34. Weninger, F., Eyben, F., Schuller, B.W., Mortillaro, M., Scherer, K.R.: On the Acoustics of Emotion in Audio: What Speech, Music and Sound have in Common. Frontiers in Psychology, Emotion Science, Special Issue on Expression of Emotion in Music and Vocal Communication 4(Article ID 292), 1–12 (2013)

    Google Scholar 

  35. Zeiler, M., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q.V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., Hinton, G.: On Rectified Linear Units for Speech Processing. In: ICASSP, Vancouver, Canada, May 2013, pp. 3517–3521. IEEE (2013)

    Google Scholar 

  36. Zhang, Z., Deng, J., Marchi, E., Schuller, B.: Active Learning by Label Uncertainty for Acoustic Emotion Recognition. In: Proceedings INTERSPEECH 2013, 14th Annual Conference of the International Speech Communication Association, Lyon, France, pp. 2841–2845. ISCA (August 2013)

    Google Scholar 

  37. Zhang, Z., Schuller, B.: Active Learning by Sparse Instance Tracking and Classifier Confidence in Acoustic Emotion Recognition. In: Proceedings INTERSPEECH 2012, 13th Annual Conference of the International Speech Communication Association, Portland, OR, p. 4. ISCA (September 2012)

    Google Scholar 

  38. Zhang, Z., Weninger, F., Wöllmer, M., Schuller, B.: Unsupervised Learning in Cross-Corpus Acoustic Emotion Recognition. In: Proceedings 12th Biannual IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2011, pp. 523–528. IEEE, Big Island (2011)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Björn Schuller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Schuller, B. (2015). Deep Learning Our Everyday Emotions. In: Bassis, S., Esposito, A., Morabito, F. (eds) Advances in Neural Networks: Computational and Theoretical Issues. Smart Innovation, Systems and Technologies, vol 37. Springer, Cham. https://doi.org/10.1007/978-3-319-18164-6_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18164-6_33

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18163-9

  • Online ISBN: 978-3-319-18164-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics