Combining modality-specific extreme learning machines for emotion recognition in the wild

Kaya, Heysem; Salah, Albert Ali

doi:10.1007/s12193-015-0175-6

Combining modality-specific extreme learning machines for emotion recognition in the wild

Original Paper
Published: 01 May 2015

Volume 10, pages 139–149, (2016)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Heysem Kaya¹ &
Albert Ali Salah¹

649 Accesses
23 Citations
Explore all metrics

Abstract

This paper proposes extreme learning machines (ELM) for modeling audio and video features for emotion recognition under uncontrolled conditions. The ELM paradigm is a fast and accurate learning alternative for single layer Feedforward networks. We experiment on the acted facial expressions in the wild corpus, which features seven discrete emotions, and adhere to the EmotiW 2014 challenge protocols. In our study, better results for both modalities are obtained with kernel ELM compared to basic ELM. We contrast several fusion approaches and reach a test set accuracy of 50.12 % (over a video-only baseline of 33.70 %) on the seven-class (i.e. six basic emotions plus neutral) EmotiW 2014 Challenge, by combining one audio and three video sub-systems. We also compare ELM with partial least squares regression based classification that is used in the top performing system of EmotiW 2014, and discuss the advantages of both approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016

Multimodal emotion recognition based on feature selection and extreme learning machine in video clips

Article 27 July 2021

Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild

Article 18 November 2015

Notes

http://extreme-learning-machines.org/.
The z-score ranges are \(\{(-\infty ,-2.5],(-2.5,-1.5],(-1.5,-0.5],(-0.5,0.5],(0.5,1.5],(1.5,2.5],(2.5,\infty )\}\).

References

Almaev TR, Valstar MF (2013) Local Gabor binary patterns from three orthogonal planes for automatic facial expression recognition. In: 2013 humaine association conference on affective computing and intelligent interaction (ACII), IEEE, pp 356–361
Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge
MATH Google Scholar
Arsigny V, Fillard P, Pennec X, Ayache N (2007) Geometric means in a novel vector space structure on symmetric positive-definite matrices. SIAM J Matrix Anal Appl 29(1):328–347
Article MathSciNet MATH Google Scholar
Burkhardt F, Paeschke A, Rolfes M, Sendlmeier W, Weiss B (2005) A database of German emotional speech. In: Proc. of INTERSPEECH 2005, pp 1517–1520
Cowie R, Sussman N, Ben-Ze’ev A (2011) Emotion: concepts and definitions. In: Petta P, Pelechaud C, Cowie R (eds) Emotion-oriented systems: the humaine handbook. Springer, Berlin, pp 9–32
Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed 19(3):34–41
Article Google Scholar
Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: baseline, data and protocol. In: Proceedings of the 16th international conference on multimodal interaction, ACM, ICMI ’14, pp 461–466
Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (2013) Emotion recognition in the wild challenge 2013. In: Proc. of the 15th ACM Intl. conf. on multimodal interaction (ICMI 2013), ACM, pp 509–516
Engberg I, Hansen A (1996) Documentation of the Danish emotional speech database (DES). Internal AAU Report, Center for Person Kommunikation, Denmark
Eyben F, Wöllmer M, Schuller B (2010) OpenSMILE: the Munich versatile and fast open-source audio feature extractor. In: Proc. of the intl. conf. on multimedia, ACM, pp 1459–1462
Hamm J, Lee DD (2008) Grassmann discriminant analysis: a unifying view on subspace-based learning. In: Proceedings of the 25th international conference on machine learning, pp 376–383
Han K, Yu D, Tashev I (2014) Speech emotion recognition using deep neural network and extreme learning machine. In: Proceedings of INTERSPEECH, ISCA, Singapore, pp 223–227
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Article MATH Google Scholar
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. Proc IEEE Int Joint Conf Neural Netw 2:985–990
Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Article Google Scholar
Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529
Article Google Scholar
Itakura F (1975) Line spectrum representation of linear predictor coefficients of speech signals. J Acoust Soc Am 57(S1):S35
Article Google Scholar
Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre c, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC, Mirza M, Jean S, Carrier PL, Dauphin Y, Boulanger-Lewandowski N, Aggarwal A, Zumer J, Lamblin P, Raymond JP, Desjardins G, Pascanu R, Warde-Farley D, Torabi A, Sharma A, Bengio E, Côté M, Konda KR, Wu Z (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ACM, ICMI ’13, pp 543–550
Kaya H, Özkaptan T, Salah AA, Gürgen F (2015) Random discriminative projection based feature selection with application to conflict recognition. IEEE Signal Process Lett 22(6):671–675. doi:10.1109/LSP.2014.2365393
Article Google Scholar
Kaya H, Eyben F, Salah AA, Schuller BW (2014) CCA Based feature selection with application to continuous depression recognition from acoustic speech features. In: Proceedings of IEEE International conference on acoustics, speech, and signal processing (ICASSP 2014), pp 3757–3761
Kaya H, Özkaptan T, Salah AA, Gürgen F (2014) Canonical Correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction. In: Proceedings of INTERSPEECH, ISCA, Singapore, pp 442–446
Kaya H, Salah AA (2014) Combining modality-specific extreme learning machines for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ACM, ICMI ’14, pp 487–493
Kaya H, Salah AA, Gurgen SF, Ekenel H (2014) Protocol and easeline for experiments on Bogazici university Turkish emotional speech corpus. In: IEEE Signal processing and communications applications conf. (SIU), 2014, pp 1698–1701
Liu M, Wang R, Huang Z, Shan S, Chen X (2013) Partial least squares regression on Grassmannian manifold for emotion recognition. In: Proceedings of the 15th ACM on International conference on multimodal interaction, ACM, ICMI ’13, pp 525–530
Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on Riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ACM, New York, NY, USA, ICMI ’14, pp 494–501
Lovrić M, Min-Oo M, Ruh EA (2000) Multivariate normal distributions parametrized as a Riemannian symmetric space. J Multivar Anal 74(1):36–48
Article MathSciNet MATH Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Lyakso E, Frolova O, Dmitrieva E, Grigorev A, Kaya H, Karpov AA (2015) EmoChildRu: emotional child russian speech corpus. INTERSPEECH (submitted)
Martin O, Kotsia I, Macq B, Pitas I (2006) The eNTERFACE ’05 audio-visual emotion database. In: Proceedings of IEEE workshop on multimedia database management
McNemar Q (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2):153–157. doi:10.1007/BF02295996
Article Google Scholar
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article MATH Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Article Google Scholar
Rao CR, Mitra SK (1971) Gen Inverse Matrices Appl, vol 7. Wiley, New York
Google Scholar
Schuller B (2011) Voice and speech analysis in search of states and traits. In: Salah AA, Gevers T (eds) Computer analysis of human behavior. Springer, Berlin, pp 227–253
Schuller B, Vlasenko B, Eyben F, Wollmer M, Stuhlsatz A, Wendemuth A, Rigoll G (2010) Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Trans Affect Comput 1(2):119–131
Article Google Scholar
Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller CA, Narayanan SS (2010) The INTERSPEECH 2010 paralinguistic challenge. In: Proceedings of INTERSPEECH, pp 2794–2797
Schuller B, Steidl S, Batliner A, Vinciarelli A, Scherer K, Ringeval F, Chetouani M, Weninger F, Eyben F, Marchi E, Mortillaro M, Salamin H, Polychroniou A, Valente F, Kim S (2013) The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In: Proceedings of INTERSPEECH, ISCA, ISCA, Lyon, France, pp 148–152
Sun B, Li L, Zuo T, Chen Y, Zhou G, Wu X (2014) Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ACM, New York, NY, USA, ICMI ’14, pp 481–486
Suykens JA, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article MathSciNet MATH Google Scholar
Vemulapalli R, Pillai JK, Chellappa R (2013) Kernel learning for extrinsic classification of manifold features. In: IEEE conference on computer vision and pattern recognition (CVPR 2013), pp 1782–1789
Wang R, Guo H, Davis LS, Dai Q (2012) Covariance discriminative learning: a natural and efficient approach to image set classification. In: IEEE conference on computer vision and pattern recognition (CVPR 2012), pp 2496–2503
Wold H (1985) Partial least squares. In: Kotz S, Johnson NL (eds) Encyclopedia of statistical sciences. Wiley, New York, pp 581–491
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Boğaziçi University, 34342, Bebek, Istanbul, Turkey
Heysem Kaya & Albert Ali Salah

Authors

Heysem Kaya
View author publications
You can also search for this author in PubMed Google Scholar
Albert Ali Salah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heysem Kaya.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kaya, H., Salah, A.A. Combining modality-specific extreme learning machines for emotion recognition in the wild. J Multimodal User Interfaces 10, 139–149 (2016). https://doi.org/10.1007/s12193-015-0175-6

Download citation

Received: 30 January 2015
Accepted: 21 April 2015
Published: 01 May 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s12193-015-0175-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining modality-specific extreme learning machines for emotion recognition in the wild

Abstract

Access this article

Similar content being viewed by others

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016

Multimodal emotion recognition based on feature selection and extreme learning machine in video clips

Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining modality-specific extreme learning machines for emotion recognition in the wild

Abstract

Access this article

Similar content being viewed by others

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016

Multimodal emotion recognition based on feature selection and extreme learning machine in video clips

Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation