Abstract
Human emotion has attracted researcher’s attention as it finds potential applications in identifying consumer’s mood and interest towards their product, assessments of learner emotional states, manufacturing smart cars, automotive industry and detecting mental states of the person in health care applications. In this paper, a well-designed committee network that focuses on the applicability of deep features for human emotion recognition from facial expressions is proposed. This architecture has the advantage of multi-level feature extraction using multiple filters that improve the performance of the network. The designed variant of inception–residual structure helps in the flow of input data through multiple paths, thus explicitly captures emotion variation from multi-path sibling layers and concatenated for recognition. The proposed algorithm is experimented with eNTERFACE, SAVEE and AFEW databases and the accuracy of 94.76%, 98.67% and 66.84%, respectively, is obtained.
Similar content being viewed by others
References
Ekman P (1992) Facial expressions of emotion: New findings, new questions. Psychol Sci 3:34–38
Mattela G, Gupta SK (2018) Facial expression recognition using gabor-mean-DWT feature extraction technique. In: Proceedings of the 5th international conference on signal processing and integrated networks (SPIN), Noida, India, 22–23 February 2018 pp 575–580
Feng X, Pietikäinen M, Hadid A (2007) Facial expression recognition based on local binary patterns. Pattern Recognit Image Anal 17:592–598
Guo Z, Zhang L, Zhang D (2010) A completed modeling of local binary pattern operator for texture classification. IEEE Trans Image Process 19:1657–1663
Jabid T, Kabir MH, Chae O (2010) Robust facial expression recognition based on local directional pattern. ETRI J 32:784–794
Yang P, Liu Q, Metaxas DN (2009) Boosting encoded dynamic features for facial expression recognition. Pattern Recognit Lett 30:132–139
Cohn JF, Zlochower AJ,Lien JJ, Kanade T (1998) Feature-point tracking by optical flow discriminates subtle differences in facial expression. In: Proceedings of the third IEEE international conference on automatic face and gesture recognition, Nara, Japan, 14–16 April 1998 p 396
Liu Y, Wang JD, Li P (2011) A feature point tracking method based on the combination of SIFT algorithm and KLT matching algorithm. J Astronaut 7:028
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1097–1105
Simonyan K, Zisserman, A (2014) Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov V, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 1–9
Parkhi OM, Vedaldi A, Zisserman et al A (2015) Deep face recognition. In BMVC vol 1 no 3: p. 6
Levi G, Hassner T (2015) Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In: Proceedings of the 2015 ACM on international conference on multimodal interaction. ACM, pp 503–510
Lowe DG (1999) Object recognition from local scale-invariant features. In: Computer vision, 1999. The proceedings of the seventh IEEE international conference on, IEEE, 1999, vol 2: pp 1150–1157.
Zhang T, Zheng W, Cui Z, Zong Y, Yan J, Yan K (2016) A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans Multimed 18(12):2528–2536
Chen L, Zhou M, Su W, Wu M, She J, Hirota K (2018) Softmax regression based deep sparse autoencoder network for facial emotion recognition in human-robot interaction. Inf Sci 428:49–61
Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on. IEEE, 2012, pp 3642– 3649
Zhao X, Liang X, Liu L, Li T, Han Y, Vasconcelos N, Yan S (2016) Peak-piloted deep network for facial expression recognition. In: European conference on computer vision, Springer pp 425– 442
Zhao J, Mao X, Zhang J (2018) Learning deep facial expression features from image and optical flow sequences using 3d cnn. Visual Comput 34:1–15
Jung H, Lee S, Yim J, Park S, Kim J (2015) Joint fine-tuning in deep neural networks for facial expression recognition. In: Computer vision (ICCV), 2015 IEEE international conference on. IEEE pp 2983–2991
Yan J, Zheng W, Cui Z, Tang C, Zhang T, Zong Y, Sun N (2016) Multi-clue fusion for emotion recognition in the wild. In: Proceedings of the 18th ACM International conference on multimodal interaction. ACM, pp 458–463
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition pp 2625–2634.
Martin O, Kotsia I, Macq B, Pitas I (2006) The eNTERFACE’05 AudioVisual Emotion Database. In Presented at the 22nd international conference on data engineering workshops (ICDEW’06); IEEE
Haq S, Jackson PJ (2009) Speaker-dependent audio-visual emotion recognition. In: Proceedings of the international conference on auditory-visual speech processing
Dhall A, Goecke R, Lucey S, Gedeon T (2012) Collecting large, richly annotated facial-expression databases from movies. IEEE Multimed 19:34–41
Mansoorizadeh M, Charkari NM (2010) Multimodal information fusion application to human emotion recognition from face and speech. Multimed Tools Appl 49(2):277–297
Datcu D, Rothkrantz L (2009) Multimodal recognition of emotions in car environments. In: Proceedings of the conference on driver-car interaction & interface (DCI&I), Prague, pp 1–9
Mansoorizadeh M, Charkari NM (2008) Bimodal person-dependent emotion recognition comparison of feature level and decision level information fusion. In: Proceedings of the 1st ACM international conference on PErvasive technologies related to assistive environments, Athens, Greece, article no. 90, pp 1–4
Bejani M, Gharavian D, Charkari NM (2012) Facial expression recognition using temporal templates. Majlesi J Electr Eng 6(2):14–20
Rashid M, Abu-Bakar SAR, Mokji M (2013) Human emotion recognition from videos using spatio-temporal and audio features. Vis Comput 29:1269–1275
Dobrišek S, Gajšek R, Mihelič F, Pavešić N, Štruc V (2013) Towards efficient multi-modal emotion recognition. Int J Adv Robot Syst 10(1):53
Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of the international conference on empirical methods in natural language processing, Lisbon, Portugal, pp 2539–2544.
Pan X, Ying G, Chen G, Li H, Li W (2019) A deep spatial and temporal aggregation framework for video-based facial expression recognition. IEEE Access 7:48807–48815
Selvaraj A, Russel NS (2019) Bimodal recognition of affective states with the features inspired from human visual and auditory perception system. Int J Imaging Syst Technol 29(4):584–598
Banda N, Robinson P (2011) Noise analysis in audio-visual emotion recognition. In: Proceedings of the 11th international conference on multimodal interaction, ACM Press, New York pp 1–4
Barros P, Wermter S (2016) Developing crossmodal expression recognition based on a deep neural model. Adapt Behav 24(5):373–396
Almaev TR, Yüce A, Ghitulescu A, Valstar MF (2013) Distribution-based iterative pairwise classification of emotions in the wild using LGBP-TOP. In: Proceedings of the 15th ACM International conference on multimodal interaction, ACM Press, Sydney, Australia pp 535–542
Gehrig T, Ekenel HK (2013) Why is facial expression analysis in the wild challenging?In: Proceedings of the 2013 on emotion recognition in the wild challenge and workshop, ACM Press, Sydney, Australia, pp 9–16
Kaya H, Gürpinar F, Afshar S, Salah AA (2015) Contrasting and combining least squares based learners for emotion recognition in the wild. In: Proceedings of the 2015 ACM on international conference on multimodal interaction, ACM Press, Seattle, Washington, USA pp 459–466
Ding W, Xu M, Huang D, Lin W, Dong M, Yu X, Li H (2016) Audio and face video emotion recognition in the wild using deep neural networks and small datasets. In: Proceedings of the 18th ACM international conference on multimodal interaction, ACM Press, Tokyo, Japan pp 506–513
Ghazi MM, Ekenel HK (2016) Automatic emotion recognition in the wild using an ensemble of static and dynamic representations. In: Proceedings of the 18th ACM international conference on multimodal interaction, ACM Press, Tokyo, Japan pp 514–521
Pini S, Ahmed OB, Cornia M, Baraldi L, Cucchiara R, Huet B (2017) Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild. In: Proceedings of the 19th ACM international conference on multimodal interaction, ACM Press, Glasgow, UK, pp 536–543
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Russel, N.S., Selvaraj, A. Robust affect analysis using committee of deep convolutional neural networks. Neural Comput & Applic 34, 3633–3645 (2022). https://doi.org/10.1007/s00521-021-06632-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06632-0