Skip to main content
Log in

EmNet: a deep integrated convolutional neural network for facial emotion recognition in the wild

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In the past decade, facial emotion recognition (FER) research saw tremendous progress, which led to the development of novel convolutional neural network (CNN) architectures for automatic recognition of facial emotions in static images. These networks, though, have achieved good recognition accuracy, they incur high computational costs and memory utilization. These issues restrict their deployment in real-world applications, which demands the FER systems to run on resource-constrained embedded devices in real-time. Thus, to alleviate these issues and to develop a robust and efficient method for automatic recognition of facial emotions in the wild with real-time performance, this paper presents a novel deep integrated CNN model, named EmNet (Emotion Network). The EmNet model consists of two structurally similar DCNN models and their integrated variant, jointly-optimized using a joint-optimization technique. For a given facial image, the EmNet gives three predictions, which are fused using two fusion schemes, namely average fusion and weighted maximum fusion, to obtain the final decision. To test the efficiency of the proposed FER pipeline on a resource-constrained embedded platform, we optimized the EmNet model and the face detector using TensorRT SDK and deploy the complete FER pipeline on the Nvidia Xavier device. Our proposed EmNet model with 4.80M parameters and 19.3MB model size attains notable improvement over the current state-of-the-art in terms of accuracy with multi-fold improvement in computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

References

  1. Aghamaleki JA, Chenarlogh VA (2019) Multi-stream CNN for facial expression recognition in limited training data. Multimed Tools Applic 78(16):22861–22882

    Article  Google Scholar 

  2. Agrawal A, Mittal N (2020) Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis Comput 36(2):405–412

    Article  Google Scholar 

  3. Alhussein M (2016) Automatic facial emotion recognition using weber local descriptor for e-healthcare system. Clust Comput 19(1):99–108

    Article  Google Scholar 

  4. Aquino G, Rubio JDJ, Pacheco J, Gutierrez GJ, Ochoa G, Balcazar R, Cruz DR, Garcia E, Novoa JF, Zacarias A (2020) Novel nonlinear hypothesis for the delta parallel robot modeling. IEEE Access 8:46324–46334

    Article  Google Scholar 

  5. Ashwin T, Guddeti RMR (2019) Automatic detection of students’ affective states in classroom environment using hybrid convolutional neural networks. Educ Inf Technol, 1–29

  6. Avots E, Sapiński T, Bachmann M, Kamińska D (2019) Audiovisual emotion recognition in wild. Mach Vis Appl 30(5):975–985

    Article  Google Scholar 

  7. Bu Y, Jia J, Tang Y, Zang X, Gao T (2018) Lookine: let the blind hear a smile. In: Thirty-Second AAAI conference on artificial intelligence

  8. Cao S, Yao Y, An G (2020) E2-capsule neural networks for facial expression recognition using au-aware attention. IET Image Process 14(11):2417–2424

    Article  Google Scholar 

  9. Carrier PL, Courville A, Goodfellow IJ, Mirza M, Bengio Y (2013) Fer-2013 face database. Universit de Montral

  10. Chiang HS, Chen MY, Huang YJ (2019) Wavelet-based eeg processing for epilepsy detection using fuzzy entropy and associative Petri net. IEEE Access 7:103255–103262

    Article  Google Scholar 

  11. Dinelli G, Meoni G, Rapuano E, Benelli G, Fanucci L (2019) An FPGA-based hardware accelerator for CNNs using on-chip memories only: design and benchmarking with intel movidius neural compute stick. International Journal of Reconfigurable Computing, 2019

  12. Ditty M, Karandikar A, Reed D (2018) Nvidia’s xavier soc. In: Hot chips: a symposium on high performance chips

  13. Elias I, Rubio JdJ, Martinez DI, Vargas TM, Garcia V, Mujica-Vargas D, Meda-Campaña JA, Pacheco J, Gutierrez GJ, Zacarias A (2020) Genetic algorithm with radial basis mapping network for the electricity consumption modeling. Appl Sci 10(12):4239

    Article  Google Scholar 

  14. Fei Z, Yang E, Li DDU, Butler S, Ijomah W, Li X, Zhou H (2020) Deep convolution network based emotion analysis towards mental health care. Neurocomputing

  15. Gao L, Li X, Song J, Shen HT (2019) Hierarchical lstms with adaptive attention for visual captioning. IEEE Trans Pattern Anal Mach Intell 42(5):1112–1131

    Google Scholar 

  16. Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836

    Article  Google Scholar 

  17. González-Lozoya S M, de la Calleja J, Pellegrin L, Escalante HJ, Medina MA, Benitez-Ruiz A (2020) Recognition of facial expressions based on CNN features. Multimedia Tools and Applications, 1–21

  18. Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge

    MATH  Google Scholar 

  19. Gordon A, Eban E, Nachum O, Chen B, Wu H, Yang TJ, Choi E (2018) Morphnet: fast & simple resource-constrained structure learning of deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1586–1595

  20. Hajarolasvadi N, Demirel H (2019) 3d CNN-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21(5):479

    Article  Google Scholar 

  21. Hernández G, Zamora E, Sossa H, Téllez G, Furlán F (2020) Hybrid neural networks for big data classification. Neurocomputing 390:327–340

    Article  Google Scholar 

  22. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:150302531

  23. Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7310–7311

  24. Jain DK, Shamsolmoali P, Sehdev P (2019) Extended deep neural network for facial emotion recognition. Pattern Recogn Lett 120:69–74

    Article  Google Scholar 

  25. Jeong M, Ko BC (2018) Driver’s facial expression recognition in real-time for safe driving. Sensors 18(12):4270

    Article  Google Scholar 

  26. de Jesús Rubio J (2009) Sofmls: online self-organizing fuzzy modified least-squares network. IEEE Trans Fuzzy Syst 17(6):1296–1309

    Article  Google Scholar 

  27. Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2983–2991

  28. Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1867–1874

  29. Kim BK, Roh J, Dong SY, Lee SY (2016) Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J Multimodal User Interfaces 10(2):173–189

    Article  Google Scholar 

  30. Kim JH, Kim BG, Roy PP, Jeong DM (2019) Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access 7:41273–41285

    Article  Google Scholar 

  31. King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758

    Google Scholar 

  32. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:14126980

  33. Kong F (2019) Facial expression recognition method based on deep convolutional neural network combined with improved lbp features. Pers Ubiquit Comput 23(3–4):531–539

    Article  Google Scholar 

  34. Kotikalapudi R, contributors (2017) keras-vis. https://github.com/raghakot/keras-vis

  35. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  36. Li D, Li Z, Luo R, Deng J, Sun S (2019) Multi-pose facial expression recognition based on generative adversarial network. IEEE Access 7:43980–143989

    Google Scholar 

  37. Li H, Wen G (2019) Sample awareness-based personalized facial expression recognition. Appl Intell 49(8):2956–2969

    Article  Google Scholar 

  38. Li K, Jin Y, Akram MW, Han R, Chen J (2020) Facial expression recognition with convolutional neural networks via a new face cropping and rotation strategy. Vis Comput 36(2):391–404

    Article  Google Scholar 

  39. Li M, Xu H, Huang X, Song Z, Liu X, Li X (2018) Facial expression recognition with identity and emotion joint learning. IEEE Transactions on Affective Computing

  40. Li P, Liu H, Si Y, Li C, Li F, Zhu X, Huang X, Zeng Y, Yao D, Zhang Y et al (2019) Eeg based emotion recognition by combining functional connectivity network and local activations. IEEE Trans Biomed Eng 66(10):2869–2881

    Article  Google Scholar 

  41. Li S, Deng W (2018) Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans Image Process 28(1):356–370

    Article  MathSciNet  MATH  Google Scholar 

  42. Li THS, Kuo PH, Tsai TN, Luan PC (2019) Cnn and lstm based facial expression analysis model for a humanoid robot. IEEE Access 7:93998–94011

    Article  Google Scholar 

  43. Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28(5):2439–2450

    Article  MathSciNet  Google Scholar 

  44. Lin M, Chen Q, Yan S (2013) Network in network. arXiv:13124400

  45. Liu M, Li S, Shan S, Chen X (2015) Au-inspired deep networks for facial expression feature learning. Neurocomputing 159:126–136

    Article  Google Scholar 

  46. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  47. Liu X, Zhou F (2019) Improved curriculum learning using ssm for facial expression recognition. Vis Comput, 1–15

  48. Mannepalli K, Sastry PN, Suman M (2018) Emotion recognition in speech signals using optimization based multi-svnn classifier. Journal of King Saud University-Computer and Information Sciences

  49. Mariappan MB, Suk M, Prabhakaran B (2012) Facefetch: a user emotion driven multimedia content recommendation system based on facial expression recognition. In: 2012 IEEE International symposium on multimedia. IEEE, pp 84–87

  50. Meda-Campaña JA (2018) On the estimation and control of nonlinear systems with parametric uncertainties and noisy outputs. IEEE Access 6:31968–31973

    Article  Google Scholar 

  51. Miao S, Xu H, Han Z, Zhu Y (2019) Recognizing facial expressions using a shallow convolutional neural network. IEEE Access 7:78000–78011

    Article  Google Scholar 

  52. Migacz S (2017) 8-bit inference with tensorrt. In: GPU technology conference, p 5

  53. Nguyen HD, Yeom S, Lee GS, Yang HJ, Na IS, Kim SH (2019) Facial emotion recognition using an ensemble of multi-level convolutional neural networks. Int J Pattern Recogn Artif Intell 33(11):1940015

    Article  Google Scholar 

  54. Oh S, Lee JY, Kim DK (2020) The design of cnn architectures for optimal six basic emotion classification using multiple physiological signals. Sensors 20(3):866

    Article  Google Scholar 

  55. Pan X, Guo W, Guo X, Li W, Xu J, Wu J (2019) Deep temporal–spatial aggregation for video-based facial expression recognition. Symmetry 11(1):52

    Article  Google Scholar 

  56. Pan X, Zhang S, Guo W, Zhao X, Chuang Y, Chen Y, Zhang H (2019) Video-based facial expression recognition using deep temporal–spatial networks. IETE Technical Review, pp 1–8

  57. Qin H, Gong R, Liu X, Bai X, Song J, Sebe N (2020) Binary neural networks: a survey. Pattern Recognition, 107281

  58. Riaz MN, Shen Y, Sohail M, Guo M (2020) exnet: an efficient approach for emotion recognition in the wild. Sensors 20(4):1087

    Article  Google Scholar 

  59. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:160904747

  60. Shao J, Qian Y (2019) Three convolutional neural network models for facial expression recognition in the wild. Neurocomputing 355:82–92

    Article  Google Scholar 

  61. Sini J, Marceddu AC, Violante M (2020) Automatic emotion recognition for the calibration of autonomous driving functions. Electronics 9(3):518

    Article  Google Scholar 

  62. Sonawane B, Sharma P (2020) . Review of automated emotion-based quantification of facial expression in parkinson’s patients environment 7:8

    Google Scholar 

  63. Song J, He T, Gao L, Xu X, Hanjalic A, Shen HT (2020) Unified binary generative adversarial network for image retrieval and compression. Int J Comput Vis, 1–22

  64. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  65. Sun N, Li Q, Huan R, Liu J, Han G (2019) Deep spatial-temporal feature fusion for facial expression recognition in static images. Pattern Recogn Lett 119:49–61

    Article  Google Scholar 

  66. Sun W, Zhao H, Jin Z (2018) A visual attention based roi detection method for facial expression recognition. Neurocomputing 296:12–22

    Article  Google Scholar 

  67. Sze V, Chen YH, Yang TJ, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329

    Article  Google Scholar 

  68. Uddin MZ, Hassan MM, Almogren A, Alamri A, Alrubaian M, Fortino G (2017) Facial expression recognition utilizing local direction-based robust features and deep belief network. IEEE Access 5:4525–4536

    Article  Google Scholar 

  69. Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069

    Article  Google Scholar 

  70. Wu M, Su W, Chen L, Liu Z, Cao W, Hirota K (2019) Weight-adapted convolution neural network for facial expression recognition in human-robot interaction. IEEE Transactions on Systems, Man, and Cybernetics: Systems

  71. Xie S, Hu H (2018) Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Trans Multimed 21(1):211–220

    Article  Google Scholar 

  72. Xie S, Hu H, Wu Y (2019) Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern Recogn 92:177–191

    Article  Google Scholar 

  73. Xie S, Hu H, Chen Y (2020) Facial expression recognition with two-branch disentangled generative adversarial network. IEEE Transactions on Circuits and Systems for Video Technology

  74. Xing X, Li Z, Xu T, Shu L, Hu B, Xu X (2019) Sae+ lstm: a new framework for emotion recognition from multi-channel eeg. Front Neurorobot 13:37

    Article  Google Scholar 

  75. Yang B, Cao J, Ni R, Zhang Y (2017) Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access 6:4630–4640

    Article  Google Scholar 

  76. Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5525–5533

  77. Zhang H, Huang B, Tian G (2020) Facial expression recognition based on deep convolution long short-term memory networks of double-channel weighted mixture. Pattern Recogn Lett 131:128–134

    Article  Google Scholar 

  78. Zhang S, Pan X, Cui Y, Zhao X, Liu L (2019) Learning affective video features for facial expression recognition via hybrid deep learning. IEEE Access 7:32297–32304

    Article  Google Scholar 

  79. Zhao G, Yang H, Yu M (2020) Expression recognition method based on a lightweight convolutional neural network. IEEE Access 8:38528–38537

    Article  Google Scholar 

  80. Zhao J, Mao X, Zhang J (2018) Learning deep facial expression features from image and optical flow sequences using 3d cnn. Vis Comput 34(10):1461–1475

    Article  Google Scholar 

  81. Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1d & 2d cnn lstm networks. Biomed Signal Process Control 47:312–323

    Article  Google Scholar 

  82. Zhao L, Wang Z, Zhang G, Qi Y, Wang X (2018) Eye state recognition based on deep integrated neural network and transfer learning. Multimed Tools Applic 77(15):19415–19438

    Article  Google Scholar 

  83. Zhong X, Liu J, Li L, Chen S, Lu W, Dong Y, Wu B, Zhong L (2020) An emotion classification algorithm based on spt-capsnet. Neural Comput Applic 32(7):1823–1837

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the Director, CSIR-CEERI, Pilani for supporting and encouraging research activities at CSIR-CEERI, Pilani. Constant motivation by the group head, Cognitive Computing Group at CSIR-CEERI is also acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumeet Saurav.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saurav, S., Saini, R. & Singh, S. EmNet: a deep integrated convolutional neural network for facial emotion recognition in the wild. Appl Intell 51, 5543–5570 (2021). https://doi.org/10.1007/s10489-020-02125-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02125-0

Keywords

Navigation