Relevance units machine based dimensional and continuous speech emotion prediction

Wang, Fengna; Sahli, Hichem; Gao, Junbin; Jiang, Dongmei; Verhelst, Werner

doi:10.1007/s11042-014-2319-1

Relevance units machine based dimensional and continuous speech emotion prediction

Published: 26 October 2014

Volume 74, pages 9983–10000, (2015)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Fengna Wang¹,
Hichem Sahli^1,2,
Junbin Gao³,
Dongmei Jiang⁴ &
…
Werner Verhelst^1,5

410 Accesses
6 Citations
Explore all metrics

Abstract

Emotion plays a significant role in human-computer interaction. The continuing improvements in speech technology have led to many new and fascinating applications in human-computer interaction, context aware computing and computer mediated communication. Such applications require reliable online recognition of the user’s affect. However most emotion recognition systems are based on speech via an isolated short sentence or word. We present a framework for online emotion recognition from speech. On the front-end, a voice activity detection algorithm is used to segment the input speech, and features are estimated to model long-term properties. Then, dimensional and continuous emotion recognition is performed via a Relevance Units Machine (RUM). The advantages of the proposed system are: (i) its computational efficiency in run-time (regression outputs can be produced continuously in pseudo real-time), (ii) RUM offers superior sparsity to the well-known Support Vector Regression (SVR) and Relevance Vector Machine for regression (RVR), and (iii) RUM’s predictive performance is comparable to SVR and RVR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impact of Lightweight Machine Learning Models for Speech Emotion Recognition

Personalised Emotion Detection from Text Using Machine Learning

Machine learning for human emotion recognition: a comprehensive review

Article Open access 20 February 2024

References

Baron-Cohen S (2003) Mind reading : the interactive guide to emotions . Jessica Kingsley Publishers
Bishop CM, et al. (2006) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Borod J (2000) The neuropsychology of emotion. Oxford University Press, USA
Google Scholar
Bouman C, Shapiro M, Cook G, Atkins C, Cheng H (1997) Cluster An unsupervised algorithm for modeling Gaussian mixtures
Busso C, Lee S, Narayanan S (2009) Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Trans Audio Speech Lang Process 17(4):582–596
Article Google Scholar
Calvo R, D’Mello S (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Affect Comput 1(1):18–37
Article Google Scholar
Cowie R, Douglas-Cowie E, Savvidou S, McMahon E, Sawey M, Schröder M (2000) FEELTRACE: an instrument for recording perceived emotion in real time. In: ISCA Tutorial and Research Workshop on Speech and Emotion
Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J (2001) Emotion recognition in human-computer interaction. IEEE Signal Proc. Mag. 18(1):32–80
Article Google Scholar
Dekens T, Verhelst W (2011) On noise robust voice activity detection. In: 12th Annual Conference of the International Speech Communication Association, pp 2649–2652
Eyben F, Wöllmer M, Graves A , Schuller B , Douglas-Cowie E, Cowie R (2010) On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues. Journal on Multimodal User Interfaces 3(1):7–19
Article Google Scholar
Fontaine JR, Scherer KR, Roesch EB, Ellsworth PC (2007) The world of emotions is not two-dimensional. Psychol Sci 18(12):1050–1057
Article Google Scholar
Gao J, Zhang J (2009) Sparse kernel learning and the relevance units machine. Advances in Knowledge Discovery and Data Mining pp 612–619
Grimm M, Kroschel K (2005) Evaluation of natural emotions using self assessment manikins. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp 381–385
Grimm M, Kroschel K (2007) Emotion estimation in speech using a 3D emotion space concept. Robust Speech Recognition and Understanding, pp 281–300
Grimm M, Kroschel K, Narayanan S (2008) The vera am mittag german audio-visual emotional speech database. In: IEEE International Conference on Multimedia and Expo, pp 865–868
Gunes H, Piccardi M, Pantic M (2008) From the lab to the real world: affect recognition using multiple cues and modalities. Affective Computing: Focus on Emotion Expression, Sythesis and Recognition pp 185–218
Huttar G (1968) Relations between prosodic variables and emotions in normal American English utterances. Journal of Speech, Language, and Hearing Research 11(3):481
Article Google Scholar
Kehrein R (2002) The prosody of authentic emotions. In: Proceedings of Speech Prosody
Lee C, Narayanan S (2005) Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing 13(2):293–303
Article Google Scholar
Lefter I, Wiggers P, Rothkrantz L (2010) EmoReSp: an online emotion recognizer based on speech. In: Proceedings of the 11th International Conference on Computer Systems and Technologies, ACM, pp 287–292
McKeown G, Valstar M, Cowie R, Pantic M , Schroder M (2012) The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans Affect Comput 3(1):5–17
Article Google Scholar
Nicolaou MA, Gunes H, Pantic M (2011) Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Trans Affect Comput 2(2):92–105
Article Google Scholar
Nicolaou MA, Gunes H, Pantic M (2012) Output-associative rvm regression for dimensional and continuous emotion prediction, vol 30, pp 186–196
Oudeyer P (2003) The production and recognition of emotions in speech: features and algorithms. Int J Hum Comput Stud 59(1–2):157–183
Google Scholar
Pantic M, Rothkrantz L (2003) Toward an affect-sensitive multimodal human-computer interaction. Proc IEEE Special Issue on Multimodal Hum Comput Interact 91(9):1370–1390
Google Scholar
Paul F, Nathoo A, Richardson H (1971) Breath sounds. Thorax. pp 288–295
Rong J, Li G, Chen Y (2009) Acoustic feature selection for automatic emotion recognition from speech. Inf Process Manag 45(3):315–328
Article Google Scholar
Russell J (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161
Article Google Scholar
Scherer K, Oshinsky J (1977) Cue utilization in emotion attribution from auditory stimuli. Motiv Emot 4:331–346
Article Google Scholar
Scherer K, Schorr A, Johnstone T (2001) Appraisal processes in emotion: theory, methods, research. Oxford University Press, USA
Google Scholar
Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M (2011) AVEC 2011–the first international audio/visual emotion challenge. Affective Computing and Intelligent Interaction, pp 415–424
Schuller B, Valstar M, Cowie R , Pantic M (2012) AVEC 2012–the continuous audio/visual emotion challenge. In: Proceedings of 2nd International Audio/Visual Emotion Challenge and Workshop, AVEC 2012
Shami M, Verhelst W (2007) An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech. Speech Comm 49(3):201–212
Article Google Scholar
Tipping M (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244
MATH MathSciNet Google Scholar
Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Comm 48(9):1162–1181
Article Google Scholar
Vogt T, André E, Bee N (2008) Emovoice - a framework for online recognition of emotions from voice. Perception in Multimodal Dialogue Systems, vol 5078, pp 188–199
Whissell C (1989) The dictionary of affect in language. Emotion: theory, research, and experience, vol 4, pp 113–131
Wöllmer M, Eyben F, Reiter S, Schuller B, Cox C, Douglas-Cowie E, Cowie R (2008) Abandoning emotion classes towards continuous emotion recognition with modelling of long-range dependencies. In: 9th Annual Conference of the International Speech Communication Association, pp 597–600
Wöllmer M, Schuller B, Eyben F, Rigoll G (2010) Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening. IEEE J Signal Process 4(5):867–881
Google Scholar
Wu D, Parsons T, Mower E, Narayanan S (2010) Speech emotion estimation in 3D space. In: IEEE International Conference on Multimedia and Expo, pp 737–742
Zeng Z, Pantic M, Roisman G, Huang T (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Article Google Scholar

Download references

Acknowledgments

The research reported in this paper has been supported in part by the CSC-VUB scholarship grant [2009]3012, and the EU FP7 project ALIZ-E grant 248116.

Author information

Authors and Affiliations

Department of Electronics and Informatics (ETRO), Vrije Universiteit Brussel (VUB), VUB-NPU Joint AVSP Lab, Pleinlaan 2, B-1050, Brussels, Belgium
Fengna Wang, Hichem Sahli & Werner Verhelst
Interuniveristy Microelectronics Center (IMEC), Kapeldreef 75, Leuven, Belgium
Hichem Sahli
School of Computing and Mathematics, Charles Sturt University, Bathurst, NSW, 2795, Australia
Junbin Gao
School of Computer Science, Northwestern Polytechnical University (NPU), VUB-NPU Joint AVSP Lab, Xi’an, China
Dongmei Jiang
iMinds, Gaston Crommenlaan 8, 9050, Ghent, Belgium
Werner Verhelst

Authors

Fengna Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hichem Sahli
View author publications
You can also search for this author in PubMed Google Scholar
Junbin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Dongmei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Werner Verhelst
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengna Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, F., Sahli, H., Gao, J. et al. Relevance units machine based dimensional and continuous speech emotion prediction. Multimed Tools Appl 74, 9983–10000 (2015). https://doi.org/10.1007/s11042-014-2319-1

Download citation

Received: 01 March 2014
Revised: 22 August 2014
Accepted: 10 October 2014
Published: 26 October 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s11042-014-2319-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Relevance units machine based dimensional and continuous speech emotion prediction

Abstract

Access this article

Similar content being viewed by others

Impact of Lightweight Machine Learning Models for Speech Emotion Recognition

Personalised Emotion Detection from Text Using Machine Learning

Machine learning for human emotion recognition: a comprehensive review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Relevance units machine based dimensional and continuous speech emotion prediction

Abstract

Access this article

Similar content being viewed by others

Impact of Lightweight Machine Learning Models for Speech Emotion Recognition

Personalised Emotion Detection from Text Using Machine Learning

Machine learning for human emotion recognition: a comprehensive review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation