research-article

Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks

Authors:

Qin JinAuthors Info & Claims

AVEC '15: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge

Pages 49 - 56

https://doi.org/10.1145/2808196.2811638

Published: 26 October 2015 Publication History

Abstract

Emotion recognition has been an active research area with both wide applications and big challenges. This paper presents our effort for the Audio/Visual Emotion Challenge (AVEC2015), whose goal is to explore utilizing audio, visual and physiological signals to continuously predict the value of the emotion dimensions (arousal and valence). Our system applies the Recurrent Neural Networks (RNN) to model temporal information. We explore various aspects to improve the prediction performance including: the dominant modalities for arousal and valence prediction, duration of features, novel loss functions, directions of Long Short Term Memory (LSTM), multi-task learning, different structures for early feature fusion and late fusion. Best settings are chosen according to the performance on the development set. Competitive experimental results compared with the baseline show the effectiveness of the proposed methods.

References

[1]

D. Litman and K. Forbes, Recognizing emotions from student speech in tutoring dialogues. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2003.

[2]

D.J. France, R.G. Shiavi, S. Silverman, M. Wilkes. Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomedical Eng. 47(7) (2000) pp. 829--837.

[3]

Stacy Marsella. Computationally Modeling Human Emotion. Communications of the ACM, Vol. 57 No. 12, Pages 56--67, 2015.

Digital Library

[4]

M. Ayadi, M. Kamel, F. Karray. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(2011) 572--587.

Digital Library

[5]

D. Wu, T. D. Parsons, E. Mower, S. Narayanan. Speech emotion estimation in 3D space. Multimedia and Expo (ICME), 2010.

[6]

M. Wollmer, M. Kaiser, F. Eyben, B. Schuller. LSTM Modelling of continuous emotions in an audiovisual affect recognition framework. Image and Vision Computing, 2012.

Digital Library

[7]

L. Chao, J. Tao, M. Yang, Y. Li, Z. Wen. Multi-scale temporal modeling for dimensional emotion recognition in video. proc. 4rd ACM international workshop on Audio/Visual emotion challenge, 2014.

Digital Library

[8]

R. Socher, A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A.Y. Ng, C.P. Potts. Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank. Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013.

[9]

Yichuan Tang. Deep Learning with Linear Support Vector Machines. In Workshop on Representational Learning, ICML 2013, Atlanta, USA, 2013.

[10]

S. Piana, A. Staglianò, F. Odone, A. Verri, A. Camurri. Real-time Automatic Emotion Recognition from Body Gestures. http://arxiv.org/pdf/1402.5047v1.pdf

[11]

G. Castellano, S. D. Villalba, A. Camurri. Recognising Human Emotions from Body Movement and Gesture Dynamics. Affective Computing and Intelligent Interaction, 2007.

Digital Library

[12]

G. Chanel, J. Kronegg, D. Grandjean, and T. Pun. Emotion assessment: Arousal evaluation using EEG's and peripheral physiological signals (Tech. Rep. 05.02). Geneva, Switzerland: Computer Vision Group, Computing Science Center, University of Geneva, 2002.

[13]

V. Rozgic, S. Ananthakrishnan, S. Saleem, R. Kumar, A. Vembu, R. Prasad. Emotion Recognition using Acoustic and Lexical Features. INTERSPEECH, 2012.

[14]

Z. Zeng, M. Pantic, G. I. Rosiman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39--58, 2009.

Digital Library

[15]

H. Gunes, M. Pantic. Automatic, dimensional and continuous emotion recognition. International Journal of Synthetic Emotions (IJSE), 1(1): 68--99, 2010.

Digital Library

[16]

M. Wöllmer, M. Kaiser, F. Eyben, B. Schuller, G. Rigoll. LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework. Image and Vision Computing, Volume 31, Issue 2, Pages 153--163, February 2013.

Digital Library

[17]

R. Xia, J. Deng, B. Schuller, Y. Liu. Modeling gender information for emotion recognition using Denoising autoencoder. 990--994, ICASSP, 2014.

[18]

Z. Huang, M. Dong, Q. Mao, Y. Zhan. Speech Emotion Recognition Using CNN. Proceedings of the ACM International Conference on Multimedia, MM14, Orlando, FL, USA, 2014.

Digital Library

[19]

S.E. Kahou, C. pal, X. Bouthillier, P.Froumenty, C. Gulcehre, R. Memisevic, P. Vincent, A. Courville, and Y. Bengio. Combining Modality Specific Deep Neural Networks for Emotion Recognition in Video. In Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI'13) pp. 543--550, 2013.

Digital Library

[20]

F. Ringeval, F. Eyben, E. Kroupi, A. Yuce, J.P. Thiran, T. Ebrahimi, D. Lalanne, B. Schuller. Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data. Pattern Recognition Letters, 29 November 2014.

[21]

C. Shan, S. Gong, and P.W. McOwan. Beyond facial expressions: Learning human emotion from body gestures. British Machine Vision Conference, Warwick, UK, 2007.

[22]

F. Ringeval, B. Schuller, M. Valstar, S. Jaiswal, E. Marchi, D. Lalanne, R. Cowie, M. Pantic. The AV+EC 2015 Multimodal Affect Recognition Challenge: Bridging Across Audio, Video, and Physiological Data, AVEC Workshop, 2015.

Digital Library

[23]

F. Eyben, M. Wöllmer, B. Schuller. OpenSMILE -- The Munich Versatile and Fast Open-Source Audio Feature Extractor. Proc. ACM Multimedia (MM), Florence, Italy, pp. 1459--1462, 2010.

Digital Library

[24]

Q. Jin, C. Li, S. Chen, H. Wu, Speech Emotion Recognition With Acoustic And Lexical Features, ICASSP, Brisbane, Australia, 2015.

[25]

B. Schuller, A. Batliner, S. Steidl, D. Seppi. Recognizing Realistic Emotions and Affect in Speech: State of the Art and Lessons Leant from the First Challenge. Speech Communication, 53(10), pp. 1062--1087, 2011.

Digital Library

[26]

S. Hochreiter, J. Schmidhuber. Long short-term memory. Neural Computation, 1997, 9(8):1735--1780.

Digital Library

[27]

Alex Graves. Generating Sequences with Recurrent Neural Networks. http://arxiv.org/pdf/1308.0850v5.pdf, 2013.

[28]

F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N. Bouchard, D. Warde-Farley and Y. Bengio. Theano: new features and speed improvements. NIPS 2012 deep learning workshop.

[29]

M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 1997, 45(11): 2673--268.

Digital Library

[30]

Robert Goodell Brown. Smoothing Forecasting and Prediction of Discrete Time Series. Englewood Cliffs, NJ: Prentice-Hall, 1963.

[31]

T. Tieleman, and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4, 2012.

[32]

I. Sutskever, J. Martens, G. Dahl et al. On the importance of initialization and momentum in deep learning. Proceedings of International Conference on Machine Learning, 2013.

[33]

Theanets: https://github.com/lmjohns3/theanets/

Cited By

Parvathi MPranathi VVarma MSatyanarayana B(2025)Voice-Based Smart System for Emotion Recognition and RegulationCognitive Computing and Cyber Physical Systems10.1007/978-3-031-77081-4_37(474-488)Online publication date: 9-Feb-2025
https://doi.org/10.1007/978-3-031-77081-4_37
Orduño-Osuna JRaygoza L. MJiménez-Sánchez RLimón-Molina GMurrieta-Rico F(2024)Machine Learning and EmotionsMachine and Deep Learning Techniques for Emotion Detection10.4018/979-8-3693-4143-8.ch001(1-23)Online publication date: 22-Mar-2024
https://doi.org/10.4018/979-8-3693-4143-8.ch001
Kumar SRani SSharma SMin H(2024)Multimodality Fusion Aspects of Medical Diagnosis: A Comprehensive ReviewBioengineering10.3390/bioengineering1112123311:12(1233)Online publication date: 5-Dec-2024
https://doi.org/10.3390/bioengineering11121233
Show More Cited By

Index Terms

Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks

Recommendations

Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition
AVEC '15: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge

This paper presents our effort to the Audio/Visual+ Emotion Challenge (AV+EC2015), whose goal is to predict the continuous values of the emotion dimensions arousal and valence from audio, visual and physiology modalities. The state of art classifier for ...
Emotion Recognition Using Physiological Signals
MIDI '15: Proceedings of the Mulitimedia, Interaction, Design and Innnovation

In this paper the problem of emotion recognition using physiological signals is presented. Firstly the problems with acquisition of physiological signals related to specific human emotions are described. It is not a trivial problem to elicit real ...
Video-based emotion recognition using CNN-RNN and C3D hybrid networks
ICMI '16: Proceedings of the 18th ACM International Conference on Multimodal Interaction

In this paper, we present a video-based emotion recognition system submitted to the EmotiW 2016 Challenge. The core module of this system is a hybrid network that combines recurrent neural network (RNN) and 3D convolutional networks (C3D) in a late-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AVEC '15: Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge

October 2015

90 pages

ISBN:9781450337434

DOI:10.1145/2808196

General Chairs:
Fabien Ringeval
University of Passau, Germany
,
Björn Schuller
University of Passau/Imperial College London, Germany/UK
,
Michel Valstar
University of Nottingham, UK
,
Roddy Cowie
Queen's University Belfast, UK
,
Maja Pantic
Imperial College London/Twente University, UK/The Netherlands

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Fundamental Research Funds for the Central Universities and the Research Funds of Renmin University of China
Natural Science Foundation of Beijing Municipality

Conference

MM '15

Sponsor:

SIGMM

MM '15: ACM Multimedia Conference

October 26, 2015

Brisbane, Australia

Acceptance Rates

AVEC '15 Paper Acceptance Rate 9 of 15 submissions, 60%;

Overall Acceptance Rate 52 of 98 submissions, 53%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

73
Total Citations
View Citations
1,338
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)4

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Parvathi MPranathi VVarma MSatyanarayana B(2025)Voice-Based Smart System for Emotion Recognition and RegulationCognitive Computing and Cyber Physical Systems10.1007/978-3-031-77081-4_37(474-488)Online publication date: 9-Feb-2025
https://doi.org/10.1007/978-3-031-77081-4_37
Orduño-Osuna JRaygoza L. MJiménez-Sánchez RLimón-Molina GMurrieta-Rico F(2024)Machine Learning and EmotionsMachine and Deep Learning Techniques for Emotion Detection10.4018/979-8-3693-4143-8.ch001(1-23)Online publication date: 22-Mar-2024
https://doi.org/10.4018/979-8-3693-4143-8.ch001
Kumar SRani SSharma SMin H(2024)Multimodality Fusion Aspects of Medical Diagnosis: A Comprehensive ReviewBioengineering10.3390/bioengineering1112123311:12(1233)Online publication date: 5-Dec-2024
https://doi.org/10.3390/bioengineering11121233
Fan ZChen FXia XLiu Y(2024)EEG Emotion Classification Based on Graph Convolutional NetworkApplied Sciences10.3390/app1402072614:2(726)Online publication date: 15-Jan-2024
https://doi.org/10.3390/app14020726
Al-Zoghby AAl-Awadly EEbada AAwad W(2024)Overview of Multimodal Machine LearningACM Transactions on Asian and Low-Resource Language Information Processing10.1145/370103124:1(1-20)Online publication date: 17-Oct-2024
https://dl.acm.org/doi/10.1145/3701031
Mandal BKhanal SCaragea DChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Contrastive Learning for Multimodal Classification of Crisis related TweetsProceedings of the ACM Web Conference 202410.1145/3589334.3648143(4555-4564)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3648143
Shi QXu WMiao Z(2024)Image-text multimodal classification via cross-attention contextual transformer with modality-collaborative learningJournal of Electronic Imaging10.1117/1.JEI.33.4.04304233:04Online publication date: 1-Jul-2024
https://doi.org/10.1117/1.JEI.33.4.043042
Romeo LOlugbade TPontil MBianchi-Berthouze N(2024)Multi-Rater Consensus Learning for Modeling Multiple Sparse Ratings of Affective BehaviourIEEE Transactions on Affective Computing10.1109/TAFFC.2023.329727015:3(859-871)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TAFFC.2023.3297270
Sikorski MHominis O(2024)Emotions in conceptual spacesPhilosophical Psychology10.1080/09515089.2024.2330477(1-27)Online publication date: 24-Mar-2024
https://doi.org/10.1080/09515089.2024.2330477
Tang JMa ZGan KZhang JYin Z(2024)Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignmentInformation Fusion10.1016/j.inffus.2023.102129103:COnline publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1016/j.inffus.2023.102129
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten