ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Stochastic Process Regression for Cross-Cultural Speech Emotion Recognition

Mani Kumar T, Enrique Sanchez, Georgios Tzimiropoulos, Timo Giesbrecht, Michel Valstar

In this work, we pose continuous apparent emotion recognition from speech as a problem of learning distributions of functions, and do so using Stochastic Processes Regression. We presume that the relation between speech signals and their corresponding emotion labels is governed by some underlying stochastic process, in contrast to existing speech emotion recognition methods that are mostly based on deterministic regression models (static or recurrent). We treat each training sequence as an instance of the underlying stochastic process which we aim to discover using a neural latent variable model, which approximates the distribution of functions with a stochastic latent variable using an encoder-decoder composition: the encoder infers the distribution over the latent variable, which the decoder uses to predict the distribution of output emotion labels. To this end, we build on the previously proposed Neural Processes theory by using (a). noisy label predictions of a backbone instead of ground truth labels for latent variable inference and (b). recurrent encoder-decoder models to alleviate the effect of commonly encountered temporal misalignment between audio features and emotion labels due to annotator reaction lag. We validated our method on AVEC’19 cross-cultural emotion recognition dataset, achieving state-of-the-art results.


doi: 10.21437/Interspeech.2021-610

Cite as: T, M.K., Sanchez, E., Tzimiropoulos, G., Giesbrecht, T., Valstar, M. (2021) Stochastic Process Regression for Cross-Cultural Speech Emotion Recognition. Proc. Interspeech 2021, 3390-3394, doi: 10.21437/Interspeech.2021-610

@inproceedings{t21_interspeech,
  author={Mani Kumar T and Enrique Sanchez and Georgios Tzimiropoulos and Timo Giesbrecht and Michel Valstar},
  title={{Stochastic Process Regression for Cross-Cultural Speech Emotion Recognition}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3390--3394},
  doi={10.21437/Interspeech.2021-610}
}