Skip to main content
Log in

Emotion recognition in the wild via sparse transductive transfer linear discriminant analysis

  • Original Paper
  • Published:
Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Abstract

Recently, emotion recognition in the wild has been attracted in computer vision and affective computing. In contrast to classical emotion recognition, emotion recognition in the wild becomes more challenging since the databases are collected under real scenarios. In such databases, there would inevitably be various adverse samples, whose emotion labels are considerably hard to be identified using many ideal databases based classical emotion recognition methods. Therefore, it significantly increases the difficulty of emotion recognition task based on the wild databases. In this paper, we propose to use a transductive transfer learning framework to handle the problem of emotion recognition in the wild. We develop a sparse transductive transfer linear discriminant analysis (STTLDA) for facial expression recognition and speech emotion recognition under real-world environments, respectively. As far as we know, the novelty of our method is that we are the first to consider emotion recognition in the wild as a transfer learning problem and use the transductive transfer learning method to eliminate the distribution difference between training and testing samples caused by the “wild”. We conduct extensive experiments on SFEW 2.0, AFEW 4.0 and 5.0 (audio part) databases, which were used in Emotion Recognition in the Wild Challenge (EmotiW 2014 and 2015) to evaluate our proposed method. Experimental results demonstrate that our proposed STTLDA achieves a satisfactory performance compared with the baseline provided by the challenge organizers and some competitive methods. In addition, we report our previous results in static image based facial expression recognition challenge of EmotiW 2015. In this competition, we achieve an accuracy of 50 % on the Test set and this result has a 10.87 % improvement compared with the baseline released by challenge organizers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syste Technol (TIST) 2(3):27

    Google Scholar 

  2. Chen J, Chen Z, Chi Z, Fu H (2014) Emotion recognition in the wild with feature fusion and multiple kernel learning. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 508–513

  3. Dhall A, Goecke R, Joshi J, Sikka K, Gedeon T (2014) Emotion recognition in the wild challenge 2014: Baseline, data and protocol. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 461–466

  4. Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T (2013) Emotion recognition in the wild challenge 2013. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 509–516

  5. Dhall A, Goecke R, Lucey S, Gedeon T (2011) Static facial expression analysis in tough conditions: data, evaluation protocol and benchmark. In: IEEE international conference on computer vision workshops (ICCV Workshops). IEEE, pp 2106–2112

  6. Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMed 19(3):34–41

    Article  Google Scholar 

  7. Dhall A, Murthy OR, Goecke R, Joshi J, Gedeon T (2015) Video and image based emotion recognition challenges in the wild: emotiw 2015. In: Proceedings of the 2015 ACM on international conference on multimodal interaction. ACM, pp 423–426

  8. El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn 44(3):572–587

    Article  MATH  Google Scholar 

  9. Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on multimedia. ACM, pp 1459–1462

  10. Huang X, Dhall A, Zhao G, Goecke R, Pietikäinen M (2015) Riesz-based volume local binary pattern and a novel group expression model for group happiness intensity analysis. In: BMVC, pp 1–13

  11. Huang X, Zhao G, Hong X, Pietikäinen M (2013) Texture description with completed local quantized patterns. In: Proceedings of scandinavian conference on image analysis. Springer, pp 1–10

  12. Huang X, Zhao G, Zheng W, Pietikäinen M (2012) Spatiotemporal local monogenic binary patterns for facial expression recognition. IEEE Signal Process Lett 19(5):243–246

    Article  Google Scholar 

  13. Kahou SE, Pal C, Bouthillier X, Froumenty P, Gülçehre Ç, Memisevic R, Vincent P, Courville A, Bengio Y, Ferrari RC et al (2013) Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 543–550

  14. Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by low-rank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184

    Article  Google Scholar 

  15. Liu M, Wang R, Huang Z, Shan S, Chen X (2013) Partial least squares regression on grassmannian manifold for emotion recognition. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 525–530

  16. Liu M, Wang R, Li S, Shan S, Huang Z, Chen X (2014) Combining multiple kernel methods on riemannian manifold for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 494–501

  17. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended cohn-kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 94–101

  18. Lyons M, Akamatsu S, Kamachi M, Gyoba J (1998) Coding facial expressions with gabor wavelets. In: Proceedings of 3rd IEEE international conference on automatic face and gesture recognition. IEEE, pp 200–205

  19. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary pattern. IEEE Trans Pattern Anal Mach Intell 24(7):971–987

    Article  MATH  Google Scholar 

  20. Ojansivu V, Heikkilä J (2008) Blur insensitive texture classification using local phase quantization. In: International conference on image and signal processing, pp 236–243

  21. Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  22. Schuller B, Steidl S, Batliner A (2009) The interspeech 2009 emotion challenge. In: INTERSPEECH, vol 2009, pp 312–315

  23. Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller CA, Narayanan SS (2010) The interspeech 2010 paralinguistic challenge. In: INTERSPEECH, pp 2794–2797

  24. Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M (2011) Avec 2011–the first international audio/visual emotion challenge. In: Affective computing and intelligent interaction. Springer, pp 415–424

  25. Schuller B, Valster M, Eyben F, Cowie R, Pantic M (2012) Avec 2012: the continuous audio/visual emotion challenge. In: Proceedings of the 14th ACM international conference on Multimodal interaction. ACM, pp 449–456

  26. Shan C, Gong S, McOwan PW (2009) Facial expression recognition based on local binary patterns: a comprehensive study. Image Vis Comput 27(6):803–816

    Article  Google Scholar 

  27. Sikka K, Dykstra K, Sathyanarayana S, Littlewort G, Bartlett M (2013) Multiple kernel learning for emotion recognition in the wild. In: Proceedings of the 15th ACM on international conference on multimodal interaction. ACM, pp 517–524

  28. Sun B, Li L, Zuo T, Chen Y, Zhou G, Wu X (2014) Combining multimodal features with hierarchical classifier fusion for emotion recognition in the wild. In: Proceedings of the 16th international conference on multimodal interaction. ACM, pp 481–486

  29. Tian Y, Kanade T, Cohn JF (2011) Facial expression recognition. In: Handbook of face recognition. Springer, pp 487–519

  30. Valstar M, Schuller B, Smith K, Eyben F, Jiang B, Bilakhia S, Schnieder S, Cowie R, Pantic M (2013) Avec 2013: the continuous audio/visual emotion and depression recognition challenge. In: 3rd ACM international workshop on Audio/visual emotion challenge. ACM, pp 3–10

  31. Zeng Z, Pantic M, Roisman G, Huang TS et al (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58

  32. Zheng W (2014) Multi-view facial expression recognition based on group sparse reduced-rank regression. IEEE Trans Affect Comput 5(1):71–85

  33. Zheng W, Xin M, Wang X, Wang B (2014) A novel speech emotion recognition method via incomplete sparse least square regression. IEEE Signal Process Lett 21(5):569–572

    Article  Google Scholar 

  34. Zheng W, Zhou X (2015) Cross-pose color facial expression recognition using transductive transfer linear discriminant analysis. In: Proceedings of the IEEE international conference on image processing. IEEE, pp 1935–1939

  35. Zong Y, Zheng W, Huang X, Yan J, Zhang T (2015) Transductive transfer lda with riesz-based volume lbp for emotion recognition in the wild. In: Proceedings of the 2015 ACM on international conference on multimodal interaction. ACM, pp 491–496

Download references

Acknowledgments

The authors would like to thank anonymous reviewers for their useful comments and valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenming Zheng.

Additional information

This work was partly supported by the National Basic Research Program of China under Grant 2015CB351704 and 2011CB302202, the National Natural Science Foundation of China (NSFC) under Grants 61231002 and 61201444, the Ph.D. Program Foundation of Ministry Education of China under Grant 20120092110054, the Natural Science Foundation of Jiangsu Province under Grant BK20130020, and the Graduate Research Innovation Project of Jiangsu Province under Grant KYZZ15_0055.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zong, Y., Zheng, W., Huang, X. et al. Emotion recognition in the wild via sparse transductive transfer linear discriminant analysis. J Multimodal User Interfaces 10, 163–172 (2016). https://doi.org/10.1007/s12193-015-0210-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12193-015-0210-7

Keywords

Navigation