Skip to main content
Log in

A facial expression recognition method based on ensemble of 3D convolutional neural networks

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, a general framework for 3D convolutional neural networks is proposed. In this framework, five kinds of layers including convolutional layer, max-pooling layer, dropout layer, Gabor layer and optical flow layer are defined. General rules of designing 3D convolutional neural networks are discussed. Four specific networks are designed for facial expression recognition. Decisions of the four networks are fused together. The single networks and the ensemble network are evaluated on the Extended Cohn–Kanade dataset and achieve accuracies of 92.31 and 96.15%. The ensemble network obtains an accuracy of 61.11% on the FEEDTUM dataset. A reusable open-source project called 4DCNN is released. Based on this project, implementing 3D convolutional neural networks for specific tasks will be convenient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow IJ, Bergeron A, Bouchard N, Bengio Y (2012) Theano: new features and speed improvements. In: Deep learning and unsupervised feature learning NIPS 2012 workshop

  2. Bergstra J, Breuleux O, Bastien F, Lamblin P, Pascanu R, Desjardins G, Turian J, Warde-Farley D, Bengio Y (2010) Theano: a CPU and GPU math expression compiler. In: Proceedings of the python for scientific computing conference (SciPy). Oral Presentation

  3. Byeon YH, Kwak KC (2014) Facial expression recognition using 3D convolutional neural network. Int J Adv Comput Sci Appl 5(12):107–112

    Google Scholar 

  4. Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 6:681–685

    Article  Google Scholar 

  5. Dhall A et al (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia 19(3):34–41

    Article  Google Scholar 

  6. Dornaika F, Moujahid A, Raducanu B (2013) Facial expression recognition using tracked facial actions: classifier performance analysis. Eng Appl Artif Intell 26(1):467–477

    Article  Google Scholar 

  7. Ekman P, Friensen E (1978) Facial action coding system (FACS): manual. Consulting Psychologists Press, Palo Alto

    Google Scholar 

  8. Friensen W, Ekman P (1983) Emfacs-7: emotional facial action coding system. Technical report, University of California at San Francisico

  9. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, pp 249–256

  10. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. arXiv preprint arXiv:1502.01852

  11. He T, Mao H, Yi Z (2016) Moving object recognition using multi-view three-dimensional convolutional neural networks. Neural Comput Appl 28(12):3827–3835

    Article  Google Scholar 

  12. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580

  13. Horn BK, Schunck BG (1981) Determining optical flow. In: 1981 Technical symposium east, pp 319–331. International Society for Optics and Photonics

  14. Jeni L, Girard JM, Cohn JF, De La Torre F et al (2013) Continuous au intensity estimation using localized, sparse facial feature space. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp 1–7. IEEE

  15. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE TransPattern Anal Mach Intell 35(1):221–231

    Article  Google Scholar 

  16. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093

  17. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv neural inf proc syst 25:1097–1105

    Google Scholar 

  18. Kruger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodriguez-Sanchez AJ, Wiskott L (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans Pattern Anal Mach Intell 35(8):1847–1871

    Article  Google Scholar 

  19. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  20. Long F, Wu T, Movellan JR, Bartlett MS, Littlewort G (2012) Learning spatiotemporal features by using independent component analysis with application to facial expression recognition. Neurocomputing 93:126–132

    Article  Google Scholar 

  21. Lorincz A, Jeni L, Szabo Z, Cohn JF, Kanade T et al (2013) Emotional expression classification using time-series kernels. In: 2013 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 889–895. IEEE

  22. Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn–Kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), pp 94–101. IEEE

  23. Matsugu M, Mori K, Mitari Y, Kaneda Y (2003) Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Netw 16(5):555–559

    Article  Google Scholar 

  24. Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl 28(12):3941–3951

    Article  Google Scholar 

  25. Qian Z, Metaxas DN, Axel L (2006) Extraction and tracking of MRI tagging sheets using a 3D Gabor filter bank. In: Engineering in medicine and biology society, 2006. EMBS’06. 28th Annual international conference of the IEEE, pp 711–714. IEEE

  26. Regianini L (2009) Manual annotations of facial fiducial points on the Cohn–Kanade database. http://lipori.dsi.unimi.it/download.html

  27. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2014) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

    MathSciNet  Google Scholar 

  28. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  29. Sun W, Jin Z (2015) The 2DCNN project. https://github.com/anders0821/2DCNN

  30. Sun W, Jin Z (2015) Advances in face image analysis: theory and applications. Facial expression classification based on convolutional neural networks. Bentham Science Publishers, Sharjah

    Google Scholar 

  31. Sun W, Jin Z (2015) The 4DCNN project. https://github.com/anders0821/4DCNN

  32. Sun W, Zhao H, Jin Z (2016) 3D convolutional neural networks for facial expression classification. In: Asian conference on computer vision workshops. Springer

  33. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv preprint arXiv:1409.4842

  34. Tran NT (2015) The matlab implementation of supervised descent method (SDM) for face alignment. https://github.com/tntrung/impSDM

  35. Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37(3):328–339

    Article  Google Scholar 

  36. Wallhoff F (2005) The facial expressions and emotions database homepage (FEEDTUM). http://www.mmk.ei.tum.de/~waf/fgnet/feedtum.html

  37. Wallhoff F, Schuller B, Hawellek M, Rigoll G (2006) Efficient recognition of authentic dynamic facial expressions on the FEEDTUM database. In: 2006 IEEE international conference on multimedia and expo, pp 493–496. IEEE

  38. Wang Y, Chua CS (2005) Face recognition from 2D and 3D images using 3D Gabor filters. Image Vis Comput 23(11):1018–1028

    Article  Google Scholar 

  39. Xiong X, Torre F (2013) Supervised descent method and its applications to face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 532–539

  40. Yang P, Liu Q, Metaxas DN (2009) Boosting encoded dynamic features for facial expression recognition. Pattern Recogn Lett 30(2):132–139

    Article  Google Scholar 

  41. Yin L, Chen X, Sun Y, Worm T, Reale M (2008) A high-resolution 3D dynamic facial expression database. In: 8th IEEE international conference on automatic face and gesture recognition, 2008. FG’08, pp 1–6. IEEE

  42. Yun T, Guan L (2013) Human emotional state recognition using real 3D visual features from Gabor library. Pattern Recogn 46(2):529–538

    Article  Google Scholar 

  43. Zeiler MD, Krishnan D, Taylor GW, Fergus R (2010) Deconvolutional networks. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 2528–2535. IEEE

  44. Zeiler MD, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 IEEE international conference on computer vision (ICCV), pp 2018–2025. IEEE

  45. Zeng Z, Pantic M, Roisman G, Huang TS et al (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58

    Article  Google Scholar 

  46. Zhang H, Cao X, Ho JK, Chow TW (2017) Object-level video advertising: an optimization framework. IEEE Trans Industr Inf 13(2):520–531

    Article  Google Scholar 

  47. Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Industr Inf 13(2):616–624

    Article  Google Scholar 

  48. Zhang L, Tjondronegoro D, Chandran V (2012) Discovering the best feature extraction and selection algorithms for spontaneous facial expression recognition. In: 2012 IEEE international conference on multimedia and expo (ICME), pp 1027–1032. IEEE

  49. Zhang X, Yin L, Cohn JF, Canavan S, Reale M, Horowitz A, Liu P, Girard JM (2014) Bp4d-spontaneous: a high-resolution spontaneous 3D dynamic facial expression database. Image Vis Comput 32(10):692–706

    Article  Google Scholar 

  50. Zheng H (2014) Facial expression analysis. Technical report, School of Computer Science and Engineering, Southeast University, Nanjing, China

  51. Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 2879–2886. IEEE

Download references

Acknowledgements

This work is partially supported by National Natural Science Foundation of China under Grant Nos. 61375007, 61373063, 61233011, 91420201, 61472187 and by National Basic Research Program of China under Grant No. 2014CB349303.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhong Jin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, W., Zhao, H. & Jin, Z. A facial expression recognition method based on ensemble of 3D convolutional neural networks. Neural Comput & Applic 31, 2795–2812 (2019). https://doi.org/10.1007/s00521-017-3230-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3230-2

Keywords

Navigation