A facial expression recognition method based on ensemble of 3D convolutional neural networks

Sun, Wenyun; Zhao, Haitao; Jin, Zhong

doi:10.1007/s00521-017-3230-2

A facial expression recognition method based on ensemble of 3D convolutional neural networks

Original Article
Published: 20 October 2017

Volume 31, pages 2795–2812, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Wenyun Sun¹,
Haitao Zhao² &
Zhong Jin¹

870 Accesses
15 Citations
Explore all metrics

Abstract

In this paper, a general framework for 3D convolutional neural networks is proposed. In this framework, five kinds of layers including convolutional layer, max-pooling layer, dropout layer, Gabor layer and optical flow layer are defined. General rules of designing 3D convolutional neural networks are discussed. Four specific networks are designed for facial expression recognition. Decisions of the four networks are fused together. The single networks and the ensemble network are evaluated on the Extended Cohn–Kanade dataset and achieve accuracies of 92.31 and 96.15%. The ensemble network obtains an accuracy of 61.11% on the FEEDTUM dataset. A reusable open-source project called 4DCNN is released. Based on this project, implementing 3D convolutional neural networks for specific tasks will be convenient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Visualizing and Understanding Convolutional Networks

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

References

Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow IJ, Bergeron A, Bouchard N, Bengio Y (2012) Theano: new features and speed improvements. In: Deep learning and unsupervised feature learning NIPS 2012 workshop
Bergstra J, Breuleux O, Bastien F, Lamblin P, Pascanu R, Desjardins G, Turian J, Warde-Farley D, Bengio Y (2010) Theano: a CPU and GPU math expression compiler. In: Proceedings of the python for scientific computing conference (SciPy). Oral Presentation
Byeon YH, Kwak KC (2014) Facial expression recognition using 3D convolutional neural network. Int J Adv Comput Sci Appl 5(12):107–112
Google Scholar
Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 6:681–685
Article Google Scholar
Dhall A et al (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia 19(3):34–41
Article Google Scholar
Dornaika F, Moujahid A, Raducanu B (2013) Facial expression recognition using tracked facial actions: classifier performance analysis. Eng Appl Artif Intell 26(1):467–477
Article Google Scholar
Ekman P, Friensen E (1978) Facial action coding system (FACS): manual. Consulting Psychologists Press, Palo Alto
Google Scholar
Friensen W, Ekman P (1983) Emfacs-7: emotional facial action coding system. Technical report, University of California at San Francisico
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, pp 249–256
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. arXiv preprint arXiv:1502.01852
He T, Mao H, Yi Z (2016) Moving object recognition using multi-view three-dimensional convolutional neural networks. Neural Comput Appl 28(12):3827–3835
Article Google Scholar
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580
Horn BK, Schunck BG (1981) Determining optical flow. In: 1981 Technical symposium east, pp 319–331. International Society for Optics and Photonics
Jeni L, Girard JM, Cohn JF, De La Torre F et al (2013) Continuous au intensity estimation using localized, sparse facial feature space. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp 1–7. IEEE
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE TransPattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv neural inf proc syst 25:1097–1105
Google Scholar
Kruger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodriguez-Sanchez AJ, Wiskott L (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans Pattern Anal Mach Intell 35(8):1847–1871
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Long F, Wu T, Movellan JR, Bartlett MS, Littlewort G (2012) Learning spatiotemporal features by using independent component analysis with application to facial expression recognition. Neurocomputing 93:126–132
Article Google Scholar
Lorincz A, Jeni L, Szabo Z, Cohn JF, Kanade T et al (2013) Emotional expression classification using time-series kernels. In: 2013 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 889–895. IEEE
Lucey P, Cohn JF, Kanade T, Saragih J, Ambadar Z, Matthews I (2010) The extended Cohn–Kanade dataset (ck+): a complete dataset for action unit and emotion-specified expression. In: 2010 IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), pp 94–101. IEEE
Matsugu M, Mori K, Mitari Y, Kaneda Y (2003) Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Netw 16(5):555–559
Article Google Scholar
Oyedotun OK, Khashman A (2016) Deep learning in vision-based static hand gesture recognition. Neural Comput Appl 28(12):3941–3951
Article Google Scholar
Qian Z, Metaxas DN, Axel L (2006) Extraction and tracking of MRI tagging sheets using a 3D Gabor filter bank. In: Engineering in medicine and biology society, 2006. EMBS’06. 28th Annual international conference of the IEEE, pp 711–714. IEEE
Regianini L (2009) Manual annotations of facial fiducial points on the Cohn–Kanade database. http://lipori.dsi.unimi.it/download.html
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2014) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
MathSciNet Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sun W, Jin Z (2015) The 2DCNN project. https://github.com/anders0821/2DCNN
Sun W, Jin Z (2015) Advances in face image analysis: theory and applications. Facial expression classification based on convolutional neural networks. Bentham Science Publishers, Sharjah
Google Scholar
Sun W, Jin Z (2015) The 4DCNN project. https://github.com/anders0821/4DCNN
Sun W, Zhao H, Jin Z (2016) 3D convolutional neural networks for facial expression classification. In: Asian conference on computer vision workshops. Springer
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv preprint arXiv:1409.4842
Tran NT (2015) The matlab implementation of supervised descent method (SDM) for face alignment. https://github.com/tntrung/impSDM
Waibel A, Hanazawa T, Hinton G, Shikano K, Lang KJ (1989) Phoneme recognition using time-delay neural networks. IEEE Trans Acoust Speech Signal Process 37(3):328–339
Article Google Scholar
Wallhoff F (2005) The facial expressions and emotions database homepage (FEEDTUM). http://www.mmk.ei.tum.de/~waf/fgnet/feedtum.html
Wallhoff F, Schuller B, Hawellek M, Rigoll G (2006) Efficient recognition of authentic dynamic facial expressions on the FEEDTUM database. In: 2006 IEEE international conference on multimedia and expo, pp 493–496. IEEE
Wang Y, Chua CS (2005) Face recognition from 2D and 3D images using 3D Gabor filters. Image Vis Comput 23(11):1018–1028
Article Google Scholar
Xiong X, Torre F (2013) Supervised descent method and its applications to face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 532–539
Yang P, Liu Q, Metaxas DN (2009) Boosting encoded dynamic features for facial expression recognition. Pattern Recogn Lett 30(2):132–139
Article Google Scholar
Yin L, Chen X, Sun Y, Worm T, Reale M (2008) A high-resolution 3D dynamic facial expression database. In: 8th IEEE international conference on automatic face and gesture recognition, 2008. FG’08, pp 1–6. IEEE
Yun T, Guan L (2013) Human emotional state recognition using real 3D visual features from Gabor library. Pattern Recogn 46(2):529–538
Article Google Scholar
Zeiler MD, Krishnan D, Taylor GW, Fergus R (2010) Deconvolutional networks. In: 2010 IEEE conference on computer vision and pattern recognition (CVPR), pp 2528–2535. IEEE
Zeiler MD, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 IEEE international conference on computer vision (ICCV), pp 2018–2025. IEEE
Zeng Z, Pantic M, Roisman G, Huang TS et al (2009) A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58
Article Google Scholar
Zhang H, Cao X, Ho JK, Chow TW (2017) Object-level video advertising: an optimization framework. IEEE Trans Industr Inf 13(2):520–531
Article Google Scholar
Zhang H, Li J, Ji Y, Yue H (2017) Understanding subtitles by character-level sequence-to-sequence learning. IEEE Trans Industr Inf 13(2):616–624
Article Google Scholar
Zhang L, Tjondronegoro D, Chandran V (2012) Discovering the best feature extraction and selection algorithms for spontaneous facial expression recognition. In: 2012 IEEE international conference on multimedia and expo (ICME), pp 1027–1032. IEEE
Zhang X, Yin L, Cohn JF, Canavan S, Reale M, Horowitz A, Liu P, Girard JM (2014) Bp4d-spontaneous: a high-resolution spontaneous 3D dynamic facial expression database. Image Vis Comput 32(10):692–706
Article Google Scholar
Zheng H (2014) Facial expression analysis. Technical report, School of Computer Science and Engineering, Southeast University, Nanjing, China
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR), pp 2879–2886. IEEE

Download references

Acknowledgements

This work is partially supported by National Natural Science Foundation of China under Grant Nos. 61375007, 61373063, 61233011, 91420201, 61472187 and by National Basic Research Program of China under Grant No. 2014CB349303.

Author information

Authors and Affiliations

Nanjing University of Science and Technology, No. 200, Xiaolingwei Street, Xuanwu District, Nanjing, Jiangsu, China
Wenyun Sun & Zhong Jin
East China University of Science and Technology, No. 130, Meilong Road, Xuhui District, Shanghai, China
Haitao Zhao

Authors

Wenyun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Haitao Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhong Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhong Jin.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, W., Zhao, H. & Jin, Z. A facial expression recognition method based on ensemble of 3D convolutional neural networks. Neural Comput & Applic 31, 2795–2812 (2019). https://doi.org/10.1007/s00521-017-3230-2

Download citation

Received: 05 May 2017
Accepted: 04 October 2017
Published: 20 October 2017
Issue Date: 01 July 2019
DOI: https://doi.org/10.1007/s00521-017-3230-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A facial expression recognition method based on ensemble of 3D convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Visualizing and Understanding Convolutional Networks

Convolutional neural network: a review of models, methodologies and applications to object detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A facial expression recognition method based on ensemble of 3D convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Visualizing and Understanding Convolutional Networks

Convolutional neural network: a review of models, methodologies and applications to object detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation