Revisiting the EmotiW challenge: how wild is it really?

Kächele, Markus; Schels, Martin; Meudt, Sascha; Palm, Günther; Schwenker, Friedhelm

doi:10.1007/s12193-015-0202-7

Revisiting the EmotiW challenge: how wild is it really?

Classification of human emotions in movie snippets based on multiple features

Original Paper
Published: 12 February 2016

Volume 10, pages 151–162, (2016)
Cite this article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Markus Kächele¹,
Martin Schels¹,
Sascha Meudt¹,
Günther Palm¹ &
…
Friedhelm Schwenker¹

534 Accesses
16 Citations
Explore all metrics

Abstract

The focus of this work is emotion recognition in the wild based on a multitude of different audio, visual and meta features. For this, a method is proposed to optimize multi-modal fusion architectures based on evolutionary computing. Extensive uni- and multi-modal experiments show the discriminative power of each computed feature set and fusion architecture. Furthermore, we summarize the EmotiW 2013/2014 challenges and review the conclusions that have been drawn and compare our results with the state-of-the-art on this dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

http://www.imdb.com/

References

Almaev TR, Yüce A, Ghitulescu A, Valstar MF (2013) Distribution-based iterative pairwise classification of emotions in the wild using LGBP-TOP. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 535–542
Atal BS, Hanauer SL (1971) Speech analysis and synthesis by linear prediction of the speech wave. J Acoust Soc Am 50(2):637–655
Article Google Scholar
Bänziger T, Mortillaro M, Scherer KR (2012) Introducing the Geneva multimodal expression corpus for experimental research on emotion perception. Emotion 12:1161–1179
Article Google Scholar
Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR ’07. ACM, pp 401–408
Cardoso JF, Souloumiac A (1993) Blind beamforming for non-gaussian signals. IEE Proc F (Radar Signal Process) 140:362–370
Article Google Scholar
Chen J, Chen Z, Chi Z, Fu H (2014) Emotion recognition in the wild with feature fusion and multiple kernel learning. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 508–513
Clavel C, Vasilescu I, Devillers L, Richard G, Ehrette T (2008) Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun 50(6):487–503
Article Google Scholar
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 1, pp 886–893
Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoust Speech Signal Process IEEE Trans 28(4):357–366
Article Google Scholar
Day M (2013) Emotion recognition with boosted tree classifiers. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 531–534
Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 461–466
Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (2013) Emotion recognition in the wild challenge 2013. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 509–516
Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed 3:34–41
Article Google Scholar
Eerola T, Vuoskoski JK (2011) A comparison of the discrete and dimensional models of emotion in music. Psychol Music 39(1):18–49
Article Google Scholar
Eyben F, Wöllmer M, Schuller B (2009) OpenEAR - introducing the Munich open-source emotion and affect recognition toolkit. In: Affective computing and intelligent interaction and workshops, 2009. ACII 2009, pp 1–6
Gehrig T, Ekenel HK (2013) Why is facial expression analysis in the wild challenging? In: Proceedings of the 2013 on emotion recognition in the wild challenge and workshop, EmotiW ’13. ACM, pp 9–16
Gómez Jáuregui DA, Martin JC (2013) Evaluation of vision-based real-time measures for emotions discrimination under uncontrolled conditions. In: Proceedings of the 2013 on emotion recognition in the wild challenge and workshop, EmotiW ’13. ACM, pp 17–22
Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: IEEE international conference on multimedia and expo, pp 865–868
Grosicki M (2014) Neural networks for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 467–472
Guoying Z, Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928
Article Google Scholar
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752
Article Google Scholar
Hermansky H (1997) The modulation spectrum in automatic recognition of speech. In: Proceedings of IEEE workshop on automatic speech recognition and understanding
Hermansky H, Morgan N, Bayya A, Kohn P (1992) RASTA-PLP speech analysis technique. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP-92), vol 1, pp 121–124
Huang X, He Q, Hong X, Zhao G, Pietikäinen M (2014) Improved spatiotemporal local monogenic binary pattern for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 514–520
Kächele M, Schels M, Schwenker F (2014) Inferring depression and affect from application dependent meta knowledge. In: Proceedings of the 4th international workshop on audio/visual emotion challenge, AVEC ’14. ACM, pp 41–48
Kächele M., Thiam P., Palm G., Schwenker F., Schels M (2015) Ensemble methods for continuous affect recognition: multi-modality, temporality, and challenges. In: Proceedings of the 5th international workshop on audio/visual emotion challenge, AVEC ’15. ACM, pp 9–16
Kächele M, Zharkov D, Meudt S, Schwenker F (2014) Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition. In: Proceedings of the international conference on pattern recognition (ICPR), pp 803–808
Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçere Ç, et al. (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 543–550
Kanade T, Cohn J, Tian Y (2000) Comprehensive database for facial expression analysis. Autom Face Gesture Recognit 2000:46–53
Article Google Scholar
Kaya H, Salah AA (2014) Combining modality-specific extreme learning machines for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 487–493
Krishna T, Rai A, Bansal S, Khandelwal S, Gupta S, Goyal D (2013) Emotion recognition using facial and audio features. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 557–564
Levi K, Weiss Y (2004) Learning object detection from a small number of examples: the importance of good features. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (CVPR), vol 2, pp II-53–II-60
Liu M, Wang R, Huang Z, Shan S, Chen X (2013) Partial least squares regression on grassmannian manifold for emotion recognition. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 525–530
Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 494–501
McKeown G, Valstar MF, Cowie R, Pantic M (2010) The SEMAINE corpus of emotionally coloured character interactions. In: IEEE international conference on multimedia and expo (ICME). IEEE, pp 1079–1084
Meng H, Pears N (2009) Descriptive temporal template features for visual motion recognition. Pattern Recognit Lett 30(12):1049–1058
Article Google Scholar
Meng H, Romera-Paredes B, Bianchi-Berthouze N (2011) Emotion recognition by two view SVM-2K classifier on dynamic facial expression features. In: 2011 IEEE international conference on automatic face gesture recognition and workshops (FG 2011), pp 854–859
Meudt S, Schwenker F (2014) Enhanced autocorrelation in real world emotion recognition. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 502–507
Meudt S, Zharkov D, Kächele M, Schwenker F (2013) Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech. In: Proceedings of the international conference on multimodal interaction, ICMI 2013. ACM, pp 551–556
Ojala T, Pietikäinen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Anal Mach Intell IEEE Trans 24(7):971–987
Article MATH Google Scholar
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Article MATH Google Scholar
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125
Article Google Scholar
Ringeval F, Amiriparian S, Eyben F, Scherer K, Schuller B (2014) Emotion recognition in the wild: Incorporating voice and lip activity in multimodal decision-level fusion. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 473–480
Ringeval F, Sonderegger A, Sauer J, Lalanne D (2013) Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: Proceedings of face and gestures 2013, 2nd IEEE international workshop on emotion representation, analysis and synthesis in continuous time and space (EmoSPACE)
Robinson DW, Dadson RS (1956) A re-determination of the equal-loudness relations for pure tones. Br J Appl Phys 7(5):166–181
Article Google Scholar
Sidorov M, Minker W (2014) Emotion recognition in real-world conditions with acoustic and visual features. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 521–524
Sikka K, Dykstra K, Sathyanarayana S, Littlewort G, Bartlett M (2013) Multiple kernel learning for emotion recognition in the wild. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 517–524
Sun B, Li L, Zuo T, Chen Y, Zhou G, Wu X (2014) Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 481–486
Tolonen T, Karjalainen M (2000) A computationally efficient multipitch analysis model. IEEE Trans Speech Audio Process 8(6):708–716
Article Google Scholar
Walter S, Scherer S, Schels M, Glodek M, Hrabal D, Schmidt M, Böck R, Limbrecht K, Traue H, Schwenker F (2011) Multimodal emotion classification in naturalistic user behavior, towards mobile and intelligent interaction environments, LNCS. In: Jacko J (ed) Human–computer interaction, vol 6763. Springer, Berlin Heidelberg, pp 603–611
Google Scholar
Weiss S, Indurkhya N, Zhang T, Damerau F (2005) Text mining: predictive methods for analyzing unstructured information, 1st edn. Springer, New York
Book MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Neural Information Processing, Ulm University, 89069, Ulm, Germany
Markus Kächele, Martin Schels, Sascha Meudt, Günther Palm & Friedhelm Schwenker

Authors

Markus Kächele
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schels
View author publications
You can also search for this author in PubMed Google Scholar
Sascha Meudt
View author publications
You can also search for this author in PubMed Google Scholar
Günther Palm
View author publications
You can also search for this author in PubMed Google Scholar
Friedhelm Schwenker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markus Kächele.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kächele, M., Schels, M., Meudt, S. et al. Revisiting the EmotiW challenge: how wild is it really?. J Multimodal User Interfaces 10, 151–162 (2016). https://doi.org/10.1007/s12193-015-0202-7

Download citation

Received: 06 February 2015
Accepted: 12 October 2015
Published: 12 February 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s12193-015-0202-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Revisiting the EmotiW challenge: how wild is it really?

Abstract

Access this article

Similar content being viewed by others

Multimodal emotion recognition based on feature selection and extreme learning machine in video clips

Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Revisiting the EmotiW challenge: how wild is it really?

Abstract

Access this article

Similar content being viewed by others

Multimodal emotion recognition based on feature selection and extreme learning machine in video clips

Combining feature-level and decision-level fusion in a hierarchical classifier for emotion recognition in the wild

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation