Elsevier

Displays

Volume 75, December 2022, 102330
Displays

The decadal perspective of facial emotion processing and Recognition: A survey,

https://doi.org/10.1016/j.displa.2022.102330Get rights and content

Highlights

  • The article classifies the fundamental procedure of FER into distinct for clear understanding.

  • The significance of each procedure in FER including face detection& tracking, extracting facial features of dynamic & static images, and facial expression classification is addressed with algorithms in this article.

  • The existing state of art deep neural networks including convolution neural network, deep belief network , the deep auto encoder , and recurrent neural network for FER are also presented in this article.

  • This article provides challenges and recommendations namely deficiency in datasets, biasness and inconsistency in data set, integration of robust models.

  • Finally this article also focusses on multimodal for effective recognition, FER with technology for sustainable health, and customized portable device for FER.

Abstract

Facial expression recognition (FER) is playing a crucial role in distinct psychological disorders, human–machine interaction, and a multitude of multimedia applications. The transformation of FER from lab to wild conditions and significant advancement in deep learning has led to the implementation of automatic FER. In this article, we provide a review of FER that includes Ekman’s six basic emotions, the significance of FER with datasets, and deep learning algorithms. The article classified the fundamental procedure of FER into distinct for clear understanding. The significance of each procedure in FER including face detection& tracking, extracting facial features of dynamic & static images, and facial expression classification is addressed with algorithms in this article. The existing state of art deep neural networks including convolution neural network (CNN), deep belief network (DBN), the deep auto encoder (DAE), and recurrent neural network (RNN) for FER are also presented in this article. Finally, the article provides challenges and recommendations namely deficiency in datasets, biasness and inconsistency in data set, integration of robust models, multimodal for effective recognition. FER with technology for sustainable health, edge computing powered devices for FER implementation, adoption of FER-based human interaction robots, and customized portable device for FER.

Introduction

Facial expression is one of the fundamental, influential, natural, and universal gestures for human beings to express their emotional state and impulsion [1], [2]. Facial emotion recognition (FER) has gained significant interest in the field of human–computer interaction, medical treatment, sociable robots, and driver warning systems.

Additionally, FER is having a prominent role in analyzing the distinct mental health conditions by psychologists and psychiatrists. In the case of identifying the Autism Spectrum Disorder (ASD) in children, FER played a key role [3], [4]. FER delivered prominent results in recognizing bipolar disorder (BD) among the patients by evaluating the facial emotions of parents [5], [6]. Recently a study is conducted that utilized FER to evaluate the alexithymia hypothesis in ASD for a spectrum of complex emotions and response time [7]. FER is also implemented for investigated the FER in maltreated children and adolescents [8], [9].

With the advancements in artificial intelligence (AI) and computer vision (CV), the field of human behavioural prediction and analysis especially human emotion has evolved significantly. Earlier in the twentieth century, Ekman and Friesen outlined six fundamental facial expressions based on a cross-cultural study and it demonstrates that people regardless of culture, perceive those six fundamental emotions including happiness, surprise, anger, fear, sadness, and disgust in a similar sense [10], [11]. Modernistic research in the field of psychology and neuroscience has proved that the concept of six fundamental emotions is related to culture-specific only [12], [13]. Due to its pioneering research as well as the clear, intuitive concept of face expressions, FER still holds the categorical model which describes emotions in terms of discrete basic emotional values. Concerning feature representations, FER is categorized into two types namely: dynamic sequence FER and static image FER. Concerning feature representation, dynamic sequence FER takes into consideration of the temporal relationship between adjacent frames in the input sequence of the facial expression [14], [15] and static image FER encodes only the spatial information from the single image [16], [17], [18]. Additionally, the multimodal system [19]has considered audio and physiological parameters along with these two vision-based methods.

Recently a study has proposed a novel FER built on graph mining, where it represents the face region in the form of a graph of edges & nodes, and applied gSpan algorithm for discovering frequent sub-graphs in a graph database [20]. A framework of FER is implemented based on hybrid features that can be trained for offline real-time applications [21]. For FER, the majority of conventional approaches have focused on handcrafted features or shallow learning, non-negative matrix factorization (NMF) [22], and sparse learning [23]. However, the development of two datasets belonging to the real world, namely Emotion Recognition in the Wild (EmotiW) [24] and FER 2013 [25] collaboratively triggered the implementation of FER from the lab environment to the wild. Moreover, the significant advancement and enhancement in chip computing and well-structured network architecture encouraged the utilization of deep learning models for achieving the utmost accuracy [26], [27], [28]. A multimodal automatic emotion recognition (AER) framework based on a convolutional neural network (CNN) is achieved maximum accuracy for classifying the emotions [29].

The contributions of the study are as follows.

  • a.

    Overview of facial emotion recognition (FER) including Ekman’s six basic emotions and facial appearance recognition databases are discussed.

  • b.

    The function of face detection and tracking in FER is addressed and significant algorithms related to facial recognition are presented.

  • c.

    The extraction of facial features of static images based on geometric features, appearance are explained, and also the extraction of facial features of dynamic images based on optical flow, feature point tracking is explained

  • d.

    In the feature reduction, the function of Adaboost, principal component analysis (PCA), and local fisher discriminant analysis (LFDA) algorithms are discussed.

  • e.

    In the facial expression classification, the significance of the following algorithms namely the hidden Markov model (HMM), the artificial neural network (ANN), a Bayesian network (BN), and the k-nearest neighbor (KNN), and the support vector machine (SVM) are presented.

  • f.

    The mechanism of pre-processing, face alignment and data augmentation for performing deep facial expression recognition are discussed.

  • g.

    Convolution Neural Network (CNN), Deep belief network (DBN), Deep autoencoder (DAE), and Recurrent neural network (RNN) are the deep networks for feature learning are also presented.

  • h.

    Deficiency in datasets, biasness and inconsistency in data set, integration of robust models, multimodal for effective recognition, FER with technology for sustainable health, edge computing powered devices for FER implementation, adoption of FER based human interaction robots, and customized portable device for FER are challenges and recommendations addressed in this article.

The organization of the study is as follows, section 2 provides the methodology, Section 3 provides Ekman’s six basic emotions, Section 4 covers the overview of FER with an available database. Section 5 covers face detection and tracking, and Section 6 covers the extraction methods of the facial features for static images & dynamic images. Section 7 covers feature reduction, Section 8 covers the facial expression classification. Section 9 covers deep learning-based FER and face normalization is covered in Section 10. Deep networks for feature learning are covered in Section 11. Challenges and recommendations are covered in Section 12. The article concludes with a conclusion and references.

Section snippets

Methodology

In this section, we present the materials and procedures used to conduct the facial emotion processing and recognition. The materials and techniques are presented in the following order: search approach, selection criterion, data acquisition and retrieval, and data assessment. This review focuses mostly on the progress of facial emotion processing and recognition. The primary research question framed is “What is the progress of techniques for facial emotion processing and recognition? Based on

Ekman’s six basic emotions

Commonly used methods via which the facial expressions are detected are with the help of the basic six emotions, i.e., fear, disgust, anger, happiness, sadness, and surprise. All these six expressions are known as universal expressions. Fig. 2 shows the basic six emotions[18] right to left from the top row: sad, happy, disgust and surprise, anger, and fear. They are known as universal expressions because they are found universally among human beings in all human beings, cultures, and

Overview of FER

Facial emotional recognition (FER) is the procedure of recognizing human emotion through verbal expressions, facial expressions, body movements, multiple physiological signals, and facial expressions. Fig. 3 shows the basic FER system. Usually, a basic FER program consists of two main steps: facial abstraction and facial expression recognition. But the principal aim of this paper is to include recent developments on each of these measures, i.e., abstracting facial expressions and classifying

Face detection and tracking

The primary and crucial starting phase for FER is face detection. The picture is divided into two parts at this point: one which includes faces, and the other, which depicts non-face regions. The key characteristics used to describe the face in the video frame include facial structure, form, skin tone, and expression. One algorithm used for face recognition is haar-classifier [48], [54], [55], [56]. Fig. 6 depicts the operation of Haar features for artifact recognition [57] where scaling the

Extraction methods of the facial features for static images

For fixed or static photos, two types of ways of facial extraction are available: 1) geometric means based on characteristics and, 2) methods based on the aspect.

Feature reduction

A complicated issue in the field of pattern recognition, like a study of facial character emotions, involves an enormous dimension of vectors in the input function. The next logical step is for the reduction of difficulty of real-time calculations by incorporating techniques of feature reduction into the set of discriminatory services. Most of the existing strategies outlined in Table 7 restrict the ultimate selection of characteristics to any subset of emotional classes such as sorrow,

Facial expression classification

After assessing the deep characteristics, FER is to assign a specific face in one of the fundamental emotional categories. In comparison to traditional methods, deep networks can run end-to-end FER with particular abstraction procedures and functional classification. To track back propagation errors, the edge of the system is further fed by the loss layer; otherwise, the predictable likelihood for any sample is explicitly indicated. Softmax loss is the most common feature used in CNN, which

Deep learning-based FER

Much development has lately been made in the field of deep-learning [116]systems that are being used in the world of information sciences, such as CNN and RNN. These algorithms have a wide variety of applications, including operations for attribute extraction, identification, and surveillance. Such algorithms have a wide range of uses, including activities to pick, arrange, and identify attributes. As shown in Fig. 15, CNN includes three different layers: a convolution sheet, a maximum pooling

Face normalization

Differences in lighting and head position caused significant changes to the photos and thereby reduced the FER efficiency. So we incorporate two different strategies of normalizing the face to improve these variations: normalizing the light and standardizing (formalizing) behaviors.

Deep networks for feature learning

Deep learning has been a burning topic in recent years for researchers. In some implementations, it has generated state-of-the-art success[150], [151]. Deep learning is built with hierarchical structures, numerous nonlinear changes, and representations to explore high-level abstractions. In this segment, we are currently applying more profound FER education techniques.

Challenges and recommendations

In this section, we will discuss the challenges that have been identified from the review. Along with the challenges of the review, this section also suggests recommendations to enhance the FER in real-time. The challenges and recommendations are discussed as follows:

  • Deficiency in quality and quantity datasets

The main focus of the researchers is now on facial expression recognition to tackle environmental conditions. In this manner, the Deep learning approach is utilized in many applications to

Conclusion

FER is playing a significant role in identifying distinct diseases namely autism spectrum disorder, bipolar disorder, maltreated children and adolescents, etc through emotions. This article highlights the significance of FER and presents a detailed review of FER. Recent advancements like computer vision, AI, and deep learning have boosted the implementation of FER in a wide range. In this article concerning feature reduction, the implementation of Adaboost, PCA, and LFDA algorithms is

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (187)

  • L.A. Cament et al.

    Face recognition under pose variation with local Gabor features enhanced by active shape and statistical models

    Pattern Recognit.

    (2015)
  • Y. Chen et al.

    Regression-based active appearance model initialization for facial feature tracking with missing frames

    Pattern Recognit. Lett.

    (2014)
  • T. Ojala et al.

    A comparative study of texture measures with classification based on featured distributions

    Pattern Recognit.

    (1996)
  • C. Shan et al.

    Facial expression recognition based on local binary patterns: A comprehensive study

    Image Vis. Comput.

    (2009)
  • E. Owusu et al.

    A neural-AdaBoost based facial expression recognition system

    Expert Syst. Appl.

    (2014)
  • J. Li et al.

    “Facial expression recognition using deep neural networks”, in

    IEEE International Conference on Imaging Systems and Techniques (IST)

    (2015)
  • W. Gu et al.

    Facial expression recognition using radial encoding of local Gabor features and classifier synthesis

    Pattern Recognit.

    (2012)
  • R.A. Khan et al.

    Framework for reliable, real-time facial expression recognition for low resolution images

    Pattern Recognit. Lett.

    (2013)
  • A. Sánchez et al.

    Differential optical flow applied to automatic facial expression recognition

    Neurocomputing

    (2011)
  • H. Fang

    Facial expression recognition in dynamic sequences: An integrated approach

    Pattern Recognit.

    (2014)
  • C. Darwin

    Te expression of the emotions in man and animals

    (1872)
  • Y.L. Tian et al.

    Recognizing action units for facial expression analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (Feb. 2001)
  • A.T. Wieckowski et al.

    Measuring change in facial emotion recognition in individuals with autism spectrum disorder: A systematic review

    Autism

    (2020)
  • S.I. Ulusoy et al.

    Facial emotion recognition deficits in patients with bipolar disorder and their healthy parents

    Gen. Hosp. Psychiatry

    (2020)
  • L. Graumann et al., “Facial emotion recognition in borderline patients is unaffected by acute psychosocial stress,” J....
  • L. Ola et al.

    Facial emotion recognition in autistic adult females correlates with alexithymia, not autism

    Autism

    (2020)
  • S. Saha

    Feature selection for facial emotion recognition using cosine similarity-based harmony search algorithm

    Appl. Sci.

    (2020)
  • G. Simcock et al., “Associations between facial emotion recognition and mental health in early adolescence,” Int. J....
  • P. Ekman et al.

    Constants across cultures in the face and emotion

    J. Pers. Soc. Psychol.

    (Feb. 1971)
  • P. Ekman

    Strong Evidence for Universals in Facial Expressions: A Reply to Russell’s Mistaken Critique

    Psychol. Bull.

    (1994)
  • R.E. Jack et al.

    Facial expressions of emotion are not culturally universal

    Proc. Natl. Acad. Sci. U. S. A.

    (May 2012)
  • D. Matsumoto

    More evidence for the universality of a contempt expression

    Motiv. Emot.

    (Dec. 1992)
  • H. Jung et al.

    Joint fine-tuning in deep neural networks for facial expression recognition

  • X. Zhao

    Peak-piloted deep network for facial expression recognition

  • A. Mollahosseini, D. Chan, and M. H. Mahoor, “Going deeper in facial expression recognition using deep neural...
  • P. Liu et al.

    “Facial Expression Recognition via a

    Boosted Deep Belief Network”

    (2014)
  • N. Samadiani

    A review on automatic facial expression recognition systems assisted by multimodal sensor data

    Sensors

    (2019)
  • A. Alreshidi et al.

    Facial emotion recognition using hybrid features

    Informatics

    (2020)
  • R. Zhi et al.

    Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition

    IEEE Trans. Syst. Man, Cybern. Part B

    (2010)
  • L. Zhong et al.

    Learning multiscale active facial patches for expression analysis

    IEEE Trans. Cybern.

    (2014)
  • A. Dhall, R. Goecke, S. Ghosh, J. Joshi, J. Hoey, and T. Gedeon, “From individual to group-level emotion recognition:...
  • I. J. Goodfellow et al., “Challenges in representation learning: A report on three machine learning contests,” in...
  • K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv Prepr....
  • C. Szegedy

    Going deeper with convolutions

  • K. He et al.

    Deep residual learning for image recognition

  • M.F.H. Siddiqui et al.

    A multimodal facial emotion recognition framework through the fusion of speech with visible and infrared images

    Multimodal Technol. Interact.

    (2020)
  • Z. Zeng et al.

    A survey of affect recognition methods: Audio, visual, and spontaneous expressions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2008)
  • R. Saran, S. Haricharan, and N. Praveen, “Facial emotion recognition using deep convolutional neural networks,” Int. J....
  • Y.-L. Tian et al.

    “Facial expression analysis”, in Handbook of face recognition

    Springer

    (2005)
  • C.-D. Căleanu

    Face expression recognition: A brief overview of the last decade

  • Cited by (6)

    This paper was recommended for publication by Prof G Guangtao Zhai.

    Peer review under responsibility of All the manuscript should have the following footnote:This paper has been recommended for acceptance by <editor_name>..

    View full text