Skip to main content
Log in

Revisiting the EmotiW challenge: how wild is it really?

Classification of human emotions in movie snippets based on multiple features

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

The focus of this work is emotion recognition in the wild based on a multitude of different audio, visual and meta features. For this, a method is proposed to optimize multi-modal fusion architectures based on evolutionary computing. Extensive uni- and multi-modal experiments show the discriminative power of each computed feature set and fusion architecture. Furthermore, we summarize the EmotiW 2013/2014 challenges and review the conclusions that have been drawn and compare our results with the state-of-the-art on this dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://www.imdb.com/

References

  1. Almaev TR, Yüce A, Ghitulescu A, Valstar MF (2013) Distribution-based iterative pairwise classification of emotions in the wild using LGBP-TOP. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 535–542

  2. Atal BS, Hanauer SL (1971) Speech analysis and synthesis by linear prediction of the speech wave. J Acoust Soc Am 50(2):637–655

    Article  Google Scholar 

  3. Bänziger T, Mortillaro M, Scherer KR (2012) Introducing the Geneva multimodal expression corpus for experimental research on emotion perception. Emotion 12:1161–1179

    Article  Google Scholar 

  4. Bosch A, Zisserman A, Munoz X (2007) Representing shape with a spatial pyramid kernel. In: Proceedings of the 6th ACM international conference on Image and video retrieval, CIVR ’07. ACM, pp 401–408

  5. Cardoso JF, Souloumiac A (1993) Blind beamforming for non-gaussian signals. IEE Proc F (Radar Signal Process) 140:362–370

    Article  Google Scholar 

  6. Chen J, Chen Z, Chi Z, Fu H (2014) Emotion recognition in the wild with feature fusion and multiple kernel learning. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 508–513

  7. Clavel C, Vasilescu I, Devillers L, Richard G, Ehrette T (2008) Fear-type emotion recognition for future audio-based surveillance systems. Speech Commun 50(6):487–503

    Article  Google Scholar 

  8. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society conference on computer vision and pattern recognition, 2005. CVPR 2005, vol 1, pp 886–893

  9. Davis S, Mermelstein P (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoust Speech Signal Process IEEE Trans 28(4):357–366

    Article  Google Scholar 

  10. Day M (2013) Emotion recognition with boosted tree classifiers. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 531–534

  11. Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 461–466

  12. Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (2013) Emotion recognition in the wild challenge 2013. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 509–516

  13. Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed 3:34–41

    Article  Google Scholar 

  14. Eerola T, Vuoskoski JK (2011) A comparison of the discrete and dimensional models of emotion in music. Psychol Music 39(1):18–49

    Article  Google Scholar 

  15. Eyben F, Wöllmer M, Schuller B (2009) OpenEAR - introducing the Munich open-source emotion and affect recognition toolkit. In: Affective computing and intelligent interaction and workshops, 2009. ACII 2009, pp 1–6

  16. Gehrig T, Ekenel HK (2013) Why is facial expression analysis in the wild challenging? In: Proceedings of the 2013 on emotion recognition in the wild challenge and workshop, EmotiW ’13. ACM, pp 9–16

  17. Gómez Jáuregui DA, Martin JC (2013) Evaluation of vision-based real-time measures for emotions discrimination under uncontrolled conditions. In: Proceedings of the 2013 on emotion recognition in the wild challenge and workshop, EmotiW ’13. ACM, pp 17–22

  18. Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: IEEE international conference on multimedia and expo, pp 865–868

  19. Grosicki M (2014) Neural networks for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 467–472

  20. Guoying Z, Pietikäinen M (2007) Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Trans Pattern Anal Mach Intell 29(6):915–928

    Article  Google Scholar 

  21. Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752

    Article  Google Scholar 

  22. Hermansky H (1997) The modulation spectrum in automatic recognition of speech. In: Proceedings of IEEE workshop on automatic speech recognition and understanding

  23. Hermansky H, Morgan N, Bayya A, Kohn P (1992) RASTA-PLP speech analysis technique. In: IEEE international conference on acoustics, speech, and signal processing (ICASSP-92), vol 1, pp 121–124

  24. Huang X, He Q, Hong X, Zhao G, Pietikäinen M (2014) Improved spatiotemporal local monogenic binary pattern for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 514–520

  25. Kächele M, Schels M, Schwenker F (2014) Inferring depression and affect from application dependent meta knowledge. In: Proceedings of the 4th international workshop on audio/visual emotion challenge, AVEC ’14. ACM, pp 41–48

  26. Kächele M., Thiam P., Palm G., Schwenker F., Schels M (2015) Ensemble methods for continuous affect recognition: multi-modality, temporality, and challenges. In: Proceedings of the 5th international workshop on audio/visual emotion challenge, AVEC ’15. ACM, pp 9–16

  27. Kächele M, Zharkov D, Meudt S, Schwenker F (2014) Prosodic, spectral and voice quality feature selection using a long-term stopping criterion for audio-based emotion recognition. In: Proceedings of the international conference on pattern recognition (ICPR), pp 803–808

  28. Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçere Ç, et al. (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 543–550

  29. Kanade T, Cohn J, Tian Y (2000) Comprehensive database for facial expression analysis. Autom Face Gesture Recognit 2000:46–53

    Article  Google Scholar 

  30. Kaya H, Salah AA (2014) Combining modality-specific extreme learning machines for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 487–493

  31. Krishna T, Rai A, Bansal S, Khandelwal S, Gupta S, Goyal D (2013) Emotion recognition using facial and audio features. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 557–564

  32. Levi K, Weiss Y (2004) Learning object detection from a small number of examples: the importance of good features. In: Proceedings of the IEEE Computer Society conference on computer vision and pattern recognition (CVPR), vol 2, pp II-53–II-60

  33. Liu M, Wang R, Huang Z, Shan S, Chen X (2013) Partial least squares regression on grassmannian manifold for emotion recognition. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 525–530

  34. Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 494–501

  35. McKeown G, Valstar MF, Cowie R, Pantic M (2010) The SEMAINE corpus of emotionally coloured character interactions. In: IEEE international conference on multimedia and expo (ICME). IEEE, pp 1079–1084

  36. Meng H, Pears N (2009) Descriptive temporal template features for visual motion recognition. Pattern Recognit Lett 30(12):1049–1058

    Article  Google Scholar 

  37. Meng H, Romera-Paredes B, Bianchi-Berthouze N (2011) Emotion recognition by two view SVM-2K classifier on dynamic facial expression features. In: 2011 IEEE international conference on automatic face gesture recognition and workshops (FG 2011), pp 854–859

  38. Meudt S, Schwenker F (2014) Enhanced autocorrelation in real world emotion recognition. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 502–507

  39. Meudt S, Zharkov D, Kächele M, Schwenker F (2013) Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech. In: Proceedings of the international conference on multimodal interaction, ICMI 2013. ACM, pp 551–556

  40. Ojala T, Pietikäinen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. Pattern Anal Mach Intell IEEE Trans 24(7):971–987

    Article  MATH  Google Scholar 

  41. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  42. Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15(11):1119–1125

    Article  Google Scholar 

  43. Ringeval F, Amiriparian S, Eyben F, Scherer K, Schuller B (2014) Emotion recognition in the wild: Incorporating voice and lip activity in multimodal decision-level fusion. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 473–480

  44. Ringeval F, Sonderegger A, Sauer J, Lalanne D (2013) Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In: Proceedings of face and gestures 2013, 2nd IEEE international workshop on emotion representation, analysis and synthesis in continuous time and space (EmoSPACE)

  45. Robinson DW, Dadson RS (1956) A re-determination of the equal-loudness relations for pure tones. Br J Appl Phys 7(5):166–181

    Article  Google Scholar 

  46. Sidorov M, Minker W (2014) Emotion recognition in real-world conditions with acoustic and visual features. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 521–524

  47. Sikka K, Dykstra K, Sathyanarayana S, Littlewort G, Bartlett M (2013) Multiple kernel learning for emotion recognition in the wild. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13. ACM, pp 517–524

  48. Sun B, Li L, Zuo T, Chen Y, Zhou G, Wu X (2014) Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction, ICMI ’14. ACM, pp 481–486

  49. Tolonen T, Karjalainen M (2000) A computationally efficient multipitch analysis model. IEEE Trans Speech Audio Process 8(6):708–716

    Article  Google Scholar 

  50. Walter S, Scherer S, Schels M, Glodek M, Hrabal D, Schmidt M, Böck R, Limbrecht K, Traue H, Schwenker F (2011) Multimodal emotion classification in naturalistic user behavior, towards mobile and intelligent interaction environments, LNCS. In: Jacko J (ed) Human–computer interaction, vol 6763. Springer, Berlin Heidelberg, pp 603–611

    Google Scholar 

  51. Weiss S, Indurkhya N, Zhang T, Damerau F (2005) Text mining: predictive methods for analyzing unstructured information, 1st edn. Springer, New York

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Kächele.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kächele, M., Schels, M., Meudt, S. et al. Revisiting the EmotiW challenge: how wild is it really?. J Multimodal User Interfaces 10, 151–162 (2016). https://doi.org/10.1007/s12193-015-0202-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-015-0202-7

Keywords

Navigation