Loading [MathJax]/extensions/MathMenu.js
Speech emotion recognition using multichannel parallel convolutional recurrent neural networks based on gammatone auditory filterbank | IEEE Conference Publication | IEEE Xplore

Speech emotion recognition using multichannel parallel convolutional recurrent neural networks based on gammatone auditory filterbank


Abstract:

Speech Emotion Recognition (SER) using deep learning methods based on computational auditory models of human auditory system is a new way to identify emotional state. In ...Show More

Abstract:

Speech Emotion Recognition (SER) using deep learning methods based on computational auditory models of human auditory system is a new way to identify emotional state. In this paper, we propose to utilize multichannel parallel convolutional recurrent neural networks (MPCRNN) to extract salient features based on Gammatone auditory filterbank from raw waveform and reveal that this method is effective for speech emotion recognition. We first divide the speech signal into segments, and then get multichannel data using Gammatone auditory filterbank, which is used as a first stage before applying MPCRNN to get the most relevant features for emotion recognition from speech. We subsequently obtain emotion state probability distribution for each speech segment. Eventually, utterance-level features are constructed from segment-level probability distributions and fed into support vector machine (SVM) to identify the emotions. According to the experimental results, speech emotion features can be effectively learned utilizing the proposed deep learning approach based on Gammatone auditory filterbank.
Date of Conference: 12-15 December 2017
Date Added to IEEE Xplore: 08 February 2018
ISBN Information:
Conference Location: Kuala Lumpur, Malaysia

Contact IEEE to Subscribe

References

References is not available for this document.