Conferences >ICASSP 2019 - 2019 IEEE Inter...

Speech Emotion Recognition Using Capsule Networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Speech emotion recognition (SER) is a fundamental step towards fluent human-machine interaction. One challenging problem in SER is obtaining utterance-level feature repre...Show More

Metadata

Abstract:

Speech emotion recognition (SER) is a fundamental step towards fluent human-machine interaction. One challenging problem in SER is obtaining utterance-level feature representation for classification. Recent works on SER have made significant progress by using spectrogram features and introducing neural network methods, e.g., convolutional neural networks (CNNs). However the fundamental problem of CNNs is that the spatial information in spectrograms is not captured, which are basically position and relationship information of low-level features like pitch and formant frequencies. This paper presents a novel architecture based on the capsule networks (CapsNets) for SER. The proposed system can take into account the spatial relationship of speech features in spectrograms, and provide an effective pooling method for obtaining utterance global features. We also introduce a recurrent connection to CapsNets to improve the model's time sensitivity. We compare the proposed model to previous published results based on combined CNN-long short-term memory (CNN-LSTM) models on the benchmark corpus IEMOCAP over four emotions, i.e., neutral, angry, happy and sad. Experimental results show that our model achieves better results than the baseline system on weighted accuracy (WA) (72.73% vs. 68.8%) and un-weighted accuracy (UA) (59.71% vs. 59.4%), which demonstrates the effectiveness of CapsNets for SER.

Published in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Date of Conference: 12-17 May 2019

Date Added to IEEE Xplore: 17 April 2019

ISBN Information:

ISSN Information:

DOI: 10.1109/ICASSP.2019.8683163

Conference Location: Brighton, UK

Contents

References is not available for this document.

Speech Emotion Recognition Using Capsule Networks

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Speech Emotion Recognition Using Capsule Networks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?