Facial expression recognition via learning deep sparse autoencoders

doi:10.1016/j.neucom.2017.08.043

Neurocomputing

Volume 273, 17 January 2018, Pages 643-649

https://doi.org/10.1016/j.neucom.2017.08.043 Get rights and content

Abstract

Facial expression recognition is an important research issue in the pattern recognition field. In this paper, we intend to present a novel framework for facial expression recognition to automatically distinguish the expressions with high accuracy. Especially, a high-dimensional feature composed by the combination of the facial geometric and appearance features is introduced to the facial expression recognition due to its containing the accurate and comprehensive information of emotions. Furthermore, the deep sparse autoencoders (DSAE) are established to recognize the facial expressions with high accuracy by learning robust and discriminative features from the data. The experiment results indicate that the presented framework can achieve a high recognition accuracy of 95.79% on the extended Cohn–Kanade (CK+) database for seven facial expressions, which outperforms the other three state-of-the-art methods by as much as 3.17%, 4.09% and 7.41%, respectively. In particular, the presented approach is also applied to recognize eight facial expressions (including the neutral) and it provides a satisfactory recognition accuracy, which successfully demonstrates the feasibility and effectiveness of the approach in this paper.

Introduction

Facial expression, as one of the most significant means for human beings to show their emotions and intensions in the process of communication, plays a significant role in human interfaces. In recent years, facial expression recognition has been under especially intensive investigation, due conceivably to its vital applications in various fields including virtual reality, intelligent tutoring system, health-care and data driven animation [1], [5], [12], [46]. The main target of facial expression recognition is to identify the human emotional state (e.g., anger, contempt, disgust, fear, happiness, sadness, and surprise [11]) based on the given facial images. It should be pointed out that it is a challenging task to automatically recognize facial expressions with high accuracy. On one hand, it is difficult to find the similarity of the same emotion state between different persons since they may express the same emotion state in various ways. On the other hand, it is also hard to seek the difference between expressions of the same person because some emotion states are too subtle to discriminate. Nevertheless, several approaches have been proposed to automatically recognize facial expressions. Generally, these methods can be separated into two classifications: the feature-based approaches and the template-based approaches [12], [27].

In this paper, we concentrate on the feature-based approach, where the expression information are extracted from appearance or geometrical features [28], [34], [35]. Here, geometric features denote the locations and shape of facial components, and the appearance features express the facial appearance changes, such as furrows, gapes, wrinkles, and bulges. Especially, the most important procedure for facial expression recognition is to extract representative features from original facial images so as to successfully distinguish different emotions. Obviously, the combination of geometric and appearance features can provide more effective facial representation because the collected features include not only the exact locations but also the skin changes. In addition, a high-dimensional feature for face recognition has been proved effective, and the performance is superior to both low-dimensional feature and the-state-of-art in most cases [3]. Inspired by this idea, we introduce the high-dimensional feature into facial expression recognition with hope to present a novel and powerful method. Up to now, a variety of statistical features have been put forward, as well as applied to the expression detection, such as local binary patterns (LBP) [31], scale invariant feature transformation (SIFT) [17] and Gabor filters [33], [49]. Particulary, the histogram of oriented gradient (HOG) [7], [13], as a good descriptor of the local appearance and also the shape, has been exploited to expression analysis in recent years. In this paper, we first locate the accurate position of dense facial landmarks with face alignment method. After that, the high-dimensional feature is formed by concatenating all descriptors which extracted from patches centered around landmarks. Specifically, three different descriptors, which are HOG, LBP and gray value, are selected and evaluated in this paper.

Deep sparse autoencoders (DSAE), one of the deep learning models, have been extensively researched and widely applied to many fields [14], [37]. In particular, the DSAE is a deep neural network built by stacked sparse Autoencoders, and the Softmax classifier is generally selected as the output layer for classification problem [32], [36]. The highlight of DSAE is that it can extract useful features by unsupervised learning, that is, it only reserves crucial information of the data in robust and discriminative representations after detecting and removing input redundancies [26]. It should be mentioned that it is a challenging task to distinguish different emotions regardless of the identity of the face because of the individual variations for the same expression and the subtleties between expressions. In addition, the external factors increase the difficulty of the recognition process, in terms of illumination, environment and cameras. So as to conquer the challenges mentioned above, we intend to establish a DSAE-based deep learning framework for facial expression recognition to classify the expressions with high accuracy by learning the useful features from the data set.

The novelty and contribution of our work are primarily threefold. (1) A high-dimensional feature, which is the combination of the facial geometric and appearance features, is introduced to the facial expression recognition due to its containing the accurate and comprehensive information of emotions. (2) A DSAE-based deep learning framework is established for facial expression recognition with high accuracy by learning robust and discriminative features from the data set. (3) The presented DSAE-based approach is successfully applied to distinguish different facial expressions on the CK+ database. Note that our work focuses on the 7-class and 8-class (including the neural) facial expression recognition, which are more difficult to distinguish. The results showed that the DSAE-based approach outperforms other three state-of-the-art approaches for 7-class recognition by as much as 3.17, 4.09, and 7.41 respectively, and it also achieves a good performance with satisfactory accuracy for 8-class recognition.

The remainder of this paper is organized as follows. In Section 2, we present a detailed introduction on the sparse autoencoder, the deep sparse autoencoders, as well as the applications to the facial expression recognition. Section 3 mainly discusses the experiment results of facial expression recognition via the deep sparse autoencoders and also evaluates its overall performance by comparing with other three state-of-the-art approaches. Finally, conclusions are summarized in Section 4.

Section snippets

Deep sparse autoencoders for facial expression recognition

In this section, we mainly introduce the sparse autoencoder and the deep neural network formed by stacked sparse autoencoders, which can learn discriminative features to distinguish the facial expressions.

Database description

We utilized the extended Cohn–Kanade (CK+) database to evaluate the proposed framework of facial expression recognition in this paper. The CK+ database released in 2010 is the extension of the Cohn–Kanade (CK), which has become one of the most widely used benchmark databases for evaluating the recognition performance of algorithms [23]. Especially, the type of emotion states in CK+ is increased to eight categories and all labels of emotion are amended and validated for the purpose of improving

Conclusions

In this paper, we have presented a novel approach for facial expression recognition using deep sparse autoencoders (DSAE), which can automatically distinguish the expressions with high accuracy. Both the facial geometric and appearance features have been introduced to compose a high-dimensional feature with accurate and comprehensive information of emotions. Particularly, the DSAE-based deep learning framework has been established for facial expression recognition to identify the expressions

Acknowledgments

This work was supported in part by the UK–China Industry Academia Partnership Programme under grant UK-CIAPP-276, in part by the Korea Foundation for Advanced Studies, in part by the Natural Science Foundation of China under grant 61403319, in part by the Fujian Natural Science Foundation under grant 2015J05131, in part by the Fujian Provincial Key Laboratory of Eco-Industrial Green Technology.

References (49)

I. Cohen et al.
Facial expression recognition from video sequences: temporal and static modeling
Comput. Vision Image Understanding
(2003)
D. Ding et al.
Distributed recursive filtering for stochastic systems under uniform quantizations and deception attacks through sensor networks
Automatica
(2017)
B. Fasel et al.
Automatic facial expression analysis: a survey
Pattern Recognit.
(2003)
J. Li et al.
Facial expression recognition using deep neural networks
2015 IEEE International Conference on Imaging Systems and Techniques
(2015)
S. Liu et al.
Extended Kalman filtering for stochastic nonlinear systems with randomly occurring cyber attacks
Neurocomputing
(2016)
Y. Liu et al.
Exponential stability of Markovian jumping Cohen-Grossberg neural networks with mixed mode-dependent time-delays
Neurocomputing
(2016)
C. Shan et al.
Facial expression recognition based on local binary patterns: a comprehensive study
Image Vision Comput.
(2009)
C. Wen et al.
A reduced-order approach to filtering for systems with linear equality constraints
Neurocomputing
(2016)
N. Zeng et al.
Denoising and deblurring gold immunochromatographic strip images via gradient projection algorithms
Neurocomputing
(2017)
N. Zeng et al.
A switching delayed PSO optimized extreme learning machine for short-term load forecasting
Neurocomputing
(2017)

J. Zhang et al.

Passivity analysis for discrete-time neural networks with mixed time-delays and randomly occurring quantization effects

Neurocomputing

(2016)

W. Zhang et al.

Event-based state estimation for a class of complex networks with time-varying delays: a comparison principle approach

Phys. Lett. A

(2017)

K. Anderson et al.

A real-time automated system for the recognition of human facial expressions

IEEE Trans. Syst. Man Cybern. Part B Cybern.

(2006)

Y. Bengio et al.

Greedy layer-wise training of deep networks

Adv. Neural Inf. Process. Syst.

(2007)

D. Chen et al.

Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification

2013 IEEE Conference on Computer Vision and Pattern Recognition

(2013)

H. Chen et al.

Pinning controllability of autonomous Boolean control networks

Sci. China Inf. Sci.

(2016)

T. Cootes et al.

Active appearance models

IEEE Trans. Pattern Anal. Mach. Intell.

(2001)

N. Dalal et al.

Histograms of oriented gradients for human detection

2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition

(2005)

A. Dempster et al.

Maximum likelihood from incomplete data via the EM algorithm

J. R. Stat. Soc.

(1977)

D. Ding et al.

Event-based security control for discrete-time stochastic systems

IET Control Theory Appl.

(2016)

P. Ekman et al.

Constants across cultures in the face and emotion

J. Pers. Social Psychol.

(1971)

T. Gritti et al.

Local features based facial expression recognition with face registration errors

The 8th IEEE International Conference on Automatic Face and Gesture Recognition

(2008)

G. Hinton et al.

Reducing the dimensionality of data with neural networks

Science

(2006)

X. Huang et al.

Spatiotemporal local monogenic binary patterns for facial expression recognition

IEEE Signal Process. Lett.

(2012)

Cited by (444)

An individual adaptive ubiquitous learning paradigm: Focusing on the collection and utilization of academic emotions
2024, Computers in Human Behavior
At present, ubiquitous learning has been becoming an important learning style for college students. However, the current studies on ubiquitous learning mostly focus on aspects such as communication and data interaction methods, while ignoring students' academic emotions. In this paper, we propose an individual adaptive ubiquitous learning paradigm that focuses on college students' academic emotions. The paradigm collects and analyses students' academic emotions to grasp student's individual characteristics. Then, the adaptive planning and adjusting learning contents is beneficial for improving students' academic emotions and enhancing the ubiquitous learning effects, initiative and concentration. In this study, the 2-year-long comparative experiments and tracking analyses were conducted from four aspects: academic grades, average study hours in after-school spare time, academic emotions status and classroom participation. The experimental results show that the individual adaptive ubiquitous learning paradigm proposed in this paper has significant positive effects and impacts on college students' learning outcomes, initiative, engagement and academic emotions.
Facial expression analysis using Decomposed Multiscale Spatiotemporal Networks
2024, Expert Systems with Applications
Video-based analysis of facial expressions has been increasingly applied to infer health states of individuals, such as depression and pain. Among the existing approaches, deep learning models composed of structures for multiscale spatiotemporal processing have shown strong potential for encoding facial dynamics. However, such models have high computational complexity, making for a difficult deployment of these solutions. To address this issue, we introduce a new technique to decompose the extraction of multiscale spatiotemporal features. Particularly, a building block structure called Decomposed Multiscale Spatiotemporal Network (DMSN) is presented along with three variants: DMSN-A, DMSN-B, and DMSN-C blocks. The DMSN-A block generates multiscale representations by analyzing spatiotemporal features at multiple temporal ranges, while the DMSN-B block analyzes spatiotemporal features at multiple ranges, and the DMSN-C block analyzes spatiotemporal features at multiple spatial sizes. Using these variants, we design our DMSN architecture which has the ability to explore a variety of multiscale spatiotemporal features, favoring the adaptation to different facial behaviors. Our extensive experiments on challenging datasets show that the DMSN-C block is effective for depression detection, whereas the DMSN-A block is efficient for pain estimation. Results also indicate that our DMSN architecture achieves competitive performance while requiring 3.51 $\times$ and 26.55 $\times$ fewer parameters than the current state-of-the-art models for depression detection and pain estimation, respectively. The code is publicly available at https://github.com/wheidima/DMSN.
GFFT: Global-local feature fusion transformers for facial expression recognition in the wild
2023, Image and Vision Computing
Facial expression recognition in the wild has become more challenging owing to various unconstrained conditions, such as facial occlusion and pose variation. Previous methods usually recognize expressions by holistic or relatively coarse local methods, but only capture limited features and are susceptible to be influenced. In this paper, we propose the Global–local Feature Fusion Transformers (GFFT) that is centered on cross-patch communication between features by self-attentive fusion. This method solves the problems of facial occlusion and pose variation effectively. Firstly, the Global Contextual Information Perception (GCIP) is designed to fuse global and local features, learning the relationship between them. Subsequently, the Facial Salient Feature Perception (FSFP) module is proposed to guide the fusion features to understand the key regions of facial features using facial landmark features to further capture face-related salient features. In addition, the Multi-scale Feature Fusion (MFF) is constructed to combine different stages of fusion features to reduce the sensitivity of the deep network to facial occlusion. Extensive experiments show that our GFFT outperforms existing state-of-the-art methods with 92.05% on RAF-DB, 67.46% on AffectNet-7, 63.62% on AffectNet-8, and 91.04% on FERPlus, demonstrating its effectiveness and robustness.
Detecting depression based on facial cues elicited by emotional stimuli in video
2023, Computers in Biology and Medicine
Recently, depression research has received considerable attention and there is an urgent need for objective and validated methods to detect depression. Depression detection based on facial expressions may be a promising adjunct to depression detection due to its non-contact nature. Stimulated facial expressions may contain more information that is useful in detecting depression than natural facial expressions. To explore facial cues in healthy controls and depressed patients in response to different emotional stimuli, facial expressions of 62 subjects were collected while watching video stimuli, and a local face reorganization method for depression detection is proposed. The method extracts the local phase pattern features, facial action unit (AU) features and head motion features of a local face reconstructed according to facial proportions, and then fed into the classifier for classification. The classification accuracy was 76.25%, with a recall of 80.44% and a specificity of 83.21%. The results demonstrated that the negative video stimuli in the single-attribute stimulus analysis were more effective in eliciting changes in facial expressions in both healthy controls and depressed patients. Fusion of facial features under both neutral and negative stimuli was found to be useful in discriminating between healthy controls and depressed individuals. The Pearson correlation coefficient (PCC) showed that changes in the emotional stimulus paradigm were more strongly correlated with changes in subjects’ facial AU when exposed to negative stimuli compared to stimuli of other attributes. These results demonstrate the feasibility of our proposed method and provide a framework for future work in assisting diagnosis.
An improved GNN using dynamic graph embedding mechanism: A novel end-to-end framework for rolling bearing fault diagnosis under variable working conditions
2023, Mechanical Systems and Signal Processing
Traditional deep learning (DL)-based rolling bearing fault diagnosis methods usually use signals collected under specific working condition to train the diagnosis models. This may lead to the lack of domain adaptive ability of these trained models, thus making it difficult to obtain satisfactory diagnosis accuracy when working conditions fluctuate. To address it, a novel fault diagnosis framework based on the graph neural network (GNN) and dynamic graph embedding mechanism (DGE) was proposed in this paper. Firstly, convolutional neural network (CNN) is used to extract the hidden fault features from raw bearing vibration signals. Secondly, DGE module is designed with edge dropout mechanism to transform the features exacted by CNN into higher-level graph-structured features dynamically. Then, GNN is applied to further mine the fault features sensitivity to the fluctuating bearing working conditions. Finally, a novel mechanism named node voters is proposed to replace traditional graph-level attribute update function in GNN to obtain optimal fault pattern recognition results. Experiment results shows that the proposed framework can not only realize the end-to-end fault diagnosis of rolling bearings, but also has excellent domain adaptive ability to obtain better stability and diagnosis accuracy under variable working conditions compared to traditional DL-based methods.
A comprehensive survey on deep facial expression recognition: challenges, applications, and future guidelines
2023, Alexandria Engineering Journal
Facial expression recognition (FER) is an emerging and multifaceted research topic. Applications of FER in healthcare, security, safe driving, and so forth have contributed to the credibility of these methods and their adoption in human-computer interaction for intelligent outcomes. Computational FER mimics human facial expression coding skills and conveys important cues that complement speech to assist listeners. Similarly, FER methods based on deep learning and artificial intelligence (AI) techniques have been developed with edge modules to ensure efficiency and real-time processing. To this end, numerous studies have explored different aspects of FER. Surveys of FER have focused on the literature on hand-crafted techniques, with a focus on general methods for local servers but largely neglecting edge vision-inspired deep learning and AI-based FER technologies. To consider these missing aspects, in this study, the existing literature on FER is thoroughly analyzed and surveyed, and the working flow of FER methods, their integral and intermediate steps, and pattern structures are highlighted. Further, the limitations in existing FER surveys are discussed. Next, FER datasets are investigated in depth, and the associated challenges and problems are discussed. In contrast to existing surveys, FER methods are considered for edge vision (on e.g., smartphone or Raspberry Pi, devices, etc.), and different measures to evaluate the performance of FER methods are comprehensively discussed. Finally, recommendations and some avenues for future research are suggested to facilitate further development and implementation of FER technologies.

View all citing articles on Scopus

Nianyin Zeng was born in Fujian Province, China, in 1986. He received the B.Eng. degree in electrical engineering and automation in 2008 and the Ph.D. degree in electrical engineering in 2013, both from Fuzhou University. From October 2012 to March 2013, he was a RA in the Department of Electrical and Electronic Engineering, the University of Hong Kong. Currently, he is an assistant professor with the Department of Instrumental & Electrical Engineering of Xiamen University. His current research interests include intelligent data analysis, computational intelligent, time-series modeling and applications. He is the author or co-author of several technical papers and also a very active reviewer for many international journals and conferences. Dr. Zeng is currently serving as an associate editor for Neurocomputing, and also an editorial board member for Biomedical Engineering Online (Springer), Journal of Advances in Biomedical Engineering and Technology, and Smart Healthcare.

Hong Zhang received her bachelor’s degree in electrical engineering and automation from the Department of Mechanical & Electrical Engineering, Xiamen University, Xiamen, China, in 2015. She is currently pursuing the master’s degree in electrical testing technology and instruments at Xiamen University, Xiamen, China. Her research interests include image processing and deep learning techniques.

Baoye Song received the B.S. degree in automation in 2005, the M.S. degree in control theory and control engineering in 2008 both from Qingdao University of Science and Technology, Qingdao, China, and the Ph.D. degree in control theory and control engineering in 2011 from Shandong University, Jinan, China. He has been with Shandong University of Science and Technology as a lecturer since 2011. His research interests include nonlinear filtering, wireless sensor network, mobile robot and fault diagnosis.

Weibo Liu received his B.S. degree in electrical engineering from the Department of Electrical Engineering & Electronics, University of Liverpool, Liverpool, UK, in 2015. He is currently pursuing the Ph.D. degree in computer science at Brunel University London, London, UK. His research interests include big data analysis and deep learning techniques.

Yurong Li was born in Fujian Province, China, in 1973. She received her master’s degree in industry automation and Ph.D. in control theory and control engineering from Zhejiang University, Zhejiang, China in 1997 and 2001, respectively. Now she is a professor at Fuzhou University. And since 2007, she is the member of Fujian Key Laboratory of Medical Instrumentation & Pharmaceutical Technology. Her research interests include biomedical instrument and intelligent information processing.

Abdullah M. Dobaie received his B.Sc. in 1981 and M.Sc. in 1989, both in electronic and communication engineering from King Abdulaziz University in Saudi Arabia, and the Ph.D. degree in 1995 from Colorado State University in USA. He is the supervisor of many masters in science and has directed many projects concerning communication, digital filters, antenna and digital signal processing. His recent interests include adaptive communication systems, digital image processing, wave propagation and communication networks.

View full text

Facial expression recognition via learning deep sparse autoencoders

Abstract

Introduction

Section snippets

Deep sparse autoencoders for facial expression recognition

Database description

Conclusions

Acknowledgments

Comput. Vision Image Understanding

Automatica

Pattern Recognit.

Neurocomputing

Neurocomputing

Image Vision Comput.

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Phys. Lett. A

A real-time automated system for the recognition of human facial expressions

IEEE Trans. Syst. Man Cybern. Part B Cybern.

Greedy layer-wise training of deep networks

Adv. Neural Inf. Process. Syst.

Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification

2013 IEEE Conference on Computer Vision and Pattern Recognition

Pinning controllability of autonomous Boolean control networks

Sci. China Inf. Sci.

Active appearance models

IEEE Trans. Pattern Anal. Mach. Intell.

Histograms of oriented gradients for human detection

2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Maximum likelihood from incomplete data via the EM algorithm

J. R. Stat. Soc.

Event-based security control for discrete-time stochastic systems

IET Control Theory Appl.

Constants across cultures in the face and emotion

J. Pers. Social Psychol.

Local features based facial expression recognition with face registration errors

The 8th IEEE International Conference on Automatic Face and Gesture Recognition

Reducing the dimensionality of data with neural networks

Science

Spatiotemporal local monogenic binary patterns for facial expression recognition

IEEE Signal Process. Lett.