Abstract
In the past decade, facial emotion recognition (FER) research saw tremendous progress, which led to the development of novel convolutional neural network (CNN) architectures for automatic recognition of facial emotions in static images. These networks, though, have achieved good recognition accuracy, they incur high computational costs and memory utilization. These issues restrict their deployment in real-world applications, which demands the FER systems to run on resource-constrained embedded devices in real-time. Thus, to alleviate these issues and to develop a robust and efficient method for automatic recognition of facial emotions in the wild with real-time performance, this paper presents a novel deep integrated CNN model, named EmNet (Emotion Network). The EmNet model consists of two structurally similar DCNN models and their integrated variant, jointly-optimized using a joint-optimization technique. For a given facial image, the EmNet gives three predictions, which are fused using two fusion schemes, namely average fusion and weighted maximum fusion, to obtain the final decision. To test the efficiency of the proposed FER pipeline on a resource-constrained embedded platform, we optimized the EmNet model and the face detector using TensorRT SDK and deploy the complete FER pipeline on the Nvidia Xavier device. Our proposed EmNet model with 4.80M parameters and 19.3MB model size attains notable improvement over the current state-of-the-art in terms of accuracy with multi-fold improvement in computational efficiency.
Similar content being viewed by others
References
Aghamaleki JA, Chenarlogh VA (2019) Multi-stream CNN for facial expression recognition in limited training data. Multimed Tools Applic 78(16):22861–22882
Agrawal A, Mittal N (2020) Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis Comput 36(2):405–412
Alhussein M (2016) Automatic facial emotion recognition using weber local descriptor for e-healthcare system. Clust Comput 19(1):99–108
Aquino G, Rubio JDJ, Pacheco J, Gutierrez GJ, Ochoa G, Balcazar R, Cruz DR, Garcia E, Novoa JF, Zacarias A (2020) Novel nonlinear hypothesis for the delta parallel robot modeling. IEEE Access 8:46324–46334
Ashwin T, Guddeti RMR (2019) Automatic detection of students’ affective states in classroom environment using hybrid convolutional neural networks. Educ Inf Technol, 1–29
Avots E, Sapiński T, Bachmann M, Kamińska D (2019) Audiovisual emotion recognition in wild. Mach Vis Appl 30(5):975–985
Bu Y, Jia J, Tang Y, Zang X, Gao T (2018) Lookine: let the blind hear a smile. In: Thirty-Second AAAI conference on artificial intelligence
Cao S, Yao Y, An G (2020) E2-capsule neural networks for facial expression recognition using au-aware attention. IET Image Process 14(11):2417–2424
Carrier PL, Courville A, Goodfellow IJ, Mirza M, Bengio Y (2013) Fer-2013 face database. Universit de Montral
Chiang HS, Chen MY, Huang YJ (2019) Wavelet-based eeg processing for epilepsy detection using fuzzy entropy and associative Petri net. IEEE Access 7:103255–103262
Dinelli G, Meoni G, Rapuano E, Benelli G, Fanucci L (2019) An FPGA-based hardware accelerator for CNNs using on-chip memories only: design and benchmarking with intel movidius neural compute stick. International Journal of Reconfigurable Computing, 2019
Ditty M, Karandikar A, Reed D (2018) Nvidia’s xavier soc. In: Hot chips: a symposium on high performance chips
Elias I, Rubio JdJ, Martinez DI, Vargas TM, Garcia V, Mujica-Vargas D, Meda-Campaña JA, Pacheco J, Gutierrez GJ, Zacarias A (2020) Genetic algorithm with radial basis mapping network for the electricity consumption modeling. Appl Sci 10(12):4239
Fei Z, Yang E, Li DDU, Butler S, Ijomah W, Li X, Zhou H (2020) Deep convolution network based emotion analysis towards mental health care. Neurocomputing
Gao L, Li X, Song J, Shen HT (2019) Hierarchical lstms with adaptive attention for visual captioning. IEEE Trans Pattern Anal Mach Intell 42(5):1112–1131
Georgescu MI, Ionescu RT, Popescu M (2019) Local learning with deep and handcrafted features for facial expression recognition. IEEE Access 7:64827–64836
González-Lozoya S M, de la Calleja J, Pellegrin L, Escalante HJ, Medina MA, Benitez-Ruiz A (2020) Recognition of facial expressions based on CNN features. Multimedia Tools and Applications, 1–21
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge
Gordon A, Eban E, Nachum O, Chen B, Wu H, Yang TJ, Choi E (2018) Morphnet: fast & simple resource-constrained structure learning of deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1586–1595
Hajarolasvadi N, Demirel H (2019) 3d CNN-based speech emotion recognition using k-means clustering and spectrograms. Entropy 21(5):479
Hernández G, Zamora E, Sossa H, Téllez G, Furlán F (2020) Hybrid neural networks for big data classification. Neurocomputing 390:327–340
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:150302531
Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7310–7311
Jain DK, Shamsolmoali P, Sehdev P (2019) Extended deep neural network for facial emotion recognition. Pattern Recogn Lett 120:69–74
Jeong M, Ko BC (2018) Driver’s facial expression recognition in real-time for safe driving. Sensors 18(12):4270
de Jesús Rubio J (2009) Sofmls: online self-organizing fuzzy modified least-squares network. IEEE Trans Fuzzy Syst 17(6):1296–1309
Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2983–2991
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1867–1874
Kim BK, Roh J, Dong SY, Lee SY (2016) Hierarchical committee of deep convolutional neural networks for robust facial expression recognition. J Multimodal User Interfaces 10(2):173–189
Kim JH, Kim BG, Roy PP, Jeong DM (2019) Efficient facial expression recognition algorithm based on hierarchical deep neural network structure. IEEE Access 7:41273–41285
King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10:1755–1758
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:14126980
Kong F (2019) Facial expression recognition method based on deep convolutional neural network combined with improved lbp features. Pers Ubiquit Comput 23(3–4):531–539
Kotikalapudi R, contributors (2017) keras-vis. https://github.com/raghakot/keras-vis
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Li D, Li Z, Luo R, Deng J, Sun S (2019) Multi-pose facial expression recognition based on generative adversarial network. IEEE Access 7:43980–143989
Li H, Wen G (2019) Sample awareness-based personalized facial expression recognition. Appl Intell 49(8):2956–2969
Li K, Jin Y, Akram MW, Han R, Chen J (2020) Facial expression recognition with convolutional neural networks via a new face cropping and rotation strategy. Vis Comput 36(2):391–404
Li M, Xu H, Huang X, Song Z, Liu X, Li X (2018) Facial expression recognition with identity and emotion joint learning. IEEE Transactions on Affective Computing
Li P, Liu H, Si Y, Li C, Li F, Zhu X, Huang X, Zeng Y, Yao D, Zhang Y et al (2019) Eeg based emotion recognition by combining functional connectivity network and local activations. IEEE Trans Biomed Eng 66(10):2869–2881
Li S, Deng W (2018) Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans Image Process 28(1):356–370
Li THS, Kuo PH, Tsai TN, Luan PC (2019) Cnn and lstm based facial expression analysis model for a humanoid robot. IEEE Access 7:93998–94011
Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28(5):2439–2450
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:13124400
Liu M, Li S, Shan S, Chen X (2015) Au-inspired deep networks for facial expression feature learning. Neurocomputing 159:126–136
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu X, Zhou F (2019) Improved curriculum learning using ssm for facial expression recognition. Vis Comput, 1–15
Mannepalli K, Sastry PN, Suman M (2018) Emotion recognition in speech signals using optimization based multi-svnn classifier. Journal of King Saud University-Computer and Information Sciences
Mariappan MB, Suk M, Prabhakaran B (2012) Facefetch: a user emotion driven multimedia content recommendation system based on facial expression recognition. In: 2012 IEEE International symposium on multimedia. IEEE, pp 84–87
Meda-Campaña JA (2018) On the estimation and control of nonlinear systems with parametric uncertainties and noisy outputs. IEEE Access 6:31968–31973
Miao S, Xu H, Han Z, Zhu Y (2019) Recognizing facial expressions using a shallow convolutional neural network. IEEE Access 7:78000–78011
Migacz S (2017) 8-bit inference with tensorrt. In: GPU technology conference, p 5
Nguyen HD, Yeom S, Lee GS, Yang HJ, Na IS, Kim SH (2019) Facial emotion recognition using an ensemble of multi-level convolutional neural networks. Int J Pattern Recogn Artif Intell 33(11):1940015
Oh S, Lee JY, Kim DK (2020) The design of cnn architectures for optimal six basic emotion classification using multiple physiological signals. Sensors 20(3):866
Pan X, Guo W, Guo X, Li W, Xu J, Wu J (2019) Deep temporal–spatial aggregation for video-based facial expression recognition. Symmetry 11(1):52
Pan X, Zhang S, Guo W, Zhao X, Chuang Y, Chen Y, Zhang H (2019) Video-based facial expression recognition using deep temporal–spatial networks. IETE Technical Review, pp 1–8
Qin H, Gong R, Liu X, Bai X, Song J, Sebe N (2020) Binary neural networks: a survey. Pattern Recognition, 107281
Riaz MN, Shen Y, Sohail M, Guo M (2020) exnet: an efficient approach for emotion recognition in the wild. Sensors 20(4):1087
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:160904747
Shao J, Qian Y (2019) Three convolutional neural network models for facial expression recognition in the wild. Neurocomputing 355:82–92
Sini J, Marceddu AC, Violante M (2020) Automatic emotion recognition for the calibration of autonomous driving functions. Electronics 9(3):518
Sonawane B, Sharma P (2020) . Review of automated emotion-based quantification of facial expression in parkinson’s patients environment 7:8
Song J, He T, Gao L, Xu X, Hanjalic A, Shen HT (2020) Unified binary generative adversarial network for image retrieval and compression. Int J Comput Vis, 1–22
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Sun N, Li Q, Huan R, Liu J, Han G (2019) Deep spatial-temporal feature fusion for facial expression recognition in static images. Pattern Recogn Lett 119:49–61
Sun W, Zhao H, Jin Z (2018) A visual attention based roi detection method for facial expression recognition. Neurocomputing 296:12–22
Sze V, Chen YH, Yang TJ, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329
Uddin MZ, Hassan MM, Almogren A, Alamri A, Alrubaian M, Fortino G (2017) Facial expression recognition utilizing local direction-based robust features and deep belief network. IEEE Access 5:4525–4536
Wang K, Peng X, Yang J, Meng D, Qiao Y (2020) Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans Image Process 29:4057–4069
Wu M, Su W, Chen L, Liu Z, Cao W, Hirota K (2019) Weight-adapted convolution neural network for facial expression recognition in human-robot interaction. IEEE Transactions on Systems, Man, and Cybernetics: Systems
Xie S, Hu H (2018) Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks. IEEE Trans Multimed 21(1):211–220
Xie S, Hu H, Wu Y (2019) Deep multi-path convolutional neural network joint with salient region attention for facial expression recognition. Pattern Recogn 92:177–191
Xie S, Hu H, Chen Y (2020) Facial expression recognition with two-branch disentangled generative adversarial network. IEEE Transactions on Circuits and Systems for Video Technology
Xing X, Li Z, Xu T, Shu L, Hu B, Xu X (2019) Sae+ lstm: a new framework for emotion recognition from multi-channel eeg. Front Neurorobot 13:37
Yang B, Cao J, Ni R, Zhang Y (2017) Facial expression recognition using weighted mixture deep neural network based on double-channel facial images. IEEE Access 6:4630–4640
Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5525–5533
Zhang H, Huang B, Tian G (2020) Facial expression recognition based on deep convolution long short-term memory networks of double-channel weighted mixture. Pattern Recogn Lett 131:128–134
Zhang S, Pan X, Cui Y, Zhao X, Liu L (2019) Learning affective video features for facial expression recognition via hybrid deep learning. IEEE Access 7:32297–32304
Zhao G, Yang H, Yu M (2020) Expression recognition method based on a lightweight convolutional neural network. IEEE Access 8:38528–38537
Zhao J, Mao X, Zhang J (2018) Learning deep facial expression features from image and optical flow sequences using 3d cnn. Vis Comput 34(10):1461–1475
Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1d & 2d cnn lstm networks. Biomed Signal Process Control 47:312–323
Zhao L, Wang Z, Zhang G, Qi Y, Wang X (2018) Eye state recognition based on deep integrated neural network and transfer learning. Multimed Tools Applic 77(15):19415–19438
Zhong X, Liu J, Li L, Chen S, Lu W, Dong Y, Wu B, Zhong L (2020) An emotion classification algorithm based on spt-capsnet. Neural Comput Applic 32(7):1823–1837
Acknowledgements
The authors would like to thank the Director, CSIR-CEERI, Pilani for supporting and encouraging research activities at CSIR-CEERI, Pilani. Constant motivation by the group head, Cognitive Computing Group at CSIR-CEERI is also acknowledged.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Saurav, S., Saini, R. & Singh, S. EmNet: a deep integrated convolutional neural network for facial emotion recognition in the wild. Appl Intell 51, 5543–5570 (2021). https://doi.org/10.1007/s10489-020-02125-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02125-0