Loading [a11y]/accessibility-menu.js
Modeling variable length phoneme sequences — A step towards linguistic information for speech emotion recognition in wider world | IEEE Conference Publication | IEEE Xplore

Modeling variable length phoneme sequences — A step towards linguistic information for speech emotion recognition in wider world


Abstract:

Vocal gestures play an important role in emotion expression and can be used by speech based emotion recognition systems. This paper proposes the use of BLSTM neural netwo...Show More

Abstract:

Vocal gestures play an important role in emotion expression and can be used by speech based emotion recognition systems. This paper proposes the use of BLSTM neural networks to model salient variable length phoneme sequences, which in turn can represent relevant vocal gestures. Unlike existing techniques, the proposed approach is not restricted to modelling phoneme sequences of a fixed length and both salience and optimal modelling length of phoneme sequences are learnt from the training data. Three possible phoneme representations that can be modelled by BLSTMs are compared and experimental results suggest that sequences of Phone Log Likelihood Ratios are more representative of emotions when compared to sequences of phoneme labels represented as one — hot vectors. On the IEMOCAP database, the proposed approach achieves an Unweighted Average Recall (UAR) of 56.4%, an improvement of 6.5% in absolute terms over the previous approach of modelling fixed length phoneme sequences on a 4-class classification problem. The proposed linguistic system is complementary to acoustic features with a fused system leading to an absolute improvement of 5% to the UAR.
Date of Conference: 23-26 October 2017
Date Added to IEEE Xplore: 01 February 2018
ISBN Information:
Electronic ISSN: 2156-8111
Conference Location: San Antonio, TX, USA

Contact IEEE to Subscribe

References

References is not available for this document.