Elsevier

Neurocomputing

Volume 273, 17 January 2018, Pages 643-649
Neurocomputing

Facial expression recognition via learning deep sparse autoencoders

https://doi.org/10.1016/j.neucom.2017.08.043Get rights and content

Abstract

Facial expression recognition is an important research issue in the pattern recognition field. In this paper, we intend to present a novel framework for facial expression recognition to automatically distinguish the expressions with high accuracy. Especially, a high-dimensional feature composed by the combination of the facial geometric and appearance features is introduced to the facial expression recognition due to its containing the accurate and comprehensive information of emotions. Furthermore, the deep sparse autoencoders (DSAE) are established to recognize the facial expressions with high accuracy by learning robust and discriminative features from the data. The experiment results indicate that the presented framework can achieve a high recognition accuracy of 95.79% on the extended Cohn–Kanade (CK+) database for seven facial expressions, which outperforms the other three state-of-the-art methods by as much as 3.17%, 4.09% and 7.41%, respectively. In particular, the presented approach is also applied to recognize eight facial expressions (including the neutral) and it provides a satisfactory recognition accuracy, which successfully demonstrates the feasibility and effectiveness of the approach in this paper.

Introduction

Facial expression, as one of the most significant means for human beings to show their emotions and intensions in the process of communication, plays a significant role in human interfaces. In recent years, facial expression recognition has been under especially intensive investigation, due conceivably to its vital applications in various fields including virtual reality, intelligent tutoring system, health-care and data driven animation [1], [5], [12], [46]. The main target of facial expression recognition is to identify the human emotional state (e.g., anger, contempt, disgust, fear, happiness, sadness, and surprise [11]) based on the given facial images. It should be pointed out that it is a challenging task to automatically recognize facial expressions with high accuracy. On one hand, it is difficult to find the similarity of the same emotion state between different persons since they may express the same emotion state in various ways. On the other hand, it is also hard to seek the difference between expressions of the same person because some emotion states are too subtle to discriminate. Nevertheless, several approaches have been proposed to automatically recognize facial expressions. Generally, these methods can be separated into two classifications: the feature-based approaches and the template-based approaches [12], [27].

In this paper, we concentrate on the feature-based approach, where the expression information are extracted from appearance or geometrical features [28], [34], [35]. Here, geometric features denote the locations and shape of facial components, and the appearance features express the facial appearance changes, such as furrows, gapes, wrinkles, and bulges. Especially, the most important procedure for facial expression recognition is to extract representative features from original facial images so as to successfully distinguish different emotions. Obviously, the combination of geometric and appearance features can provide more effective facial representation because the collected features include not only the exact locations but also the skin changes. In addition, a high-dimensional feature for face recognition has been proved effective, and the performance is superior to both low-dimensional feature and the-state-of-art in most cases [3]. Inspired by this idea, we introduce the high-dimensional feature into facial expression recognition with hope to present a novel and powerful method. Up to now, a variety of statistical features have been put forward, as well as applied to the expression detection, such as local binary patterns (LBP) [31], scale invariant feature transformation (SIFT) [17] and Gabor filters [33], [49]. Particulary, the histogram of oriented gradient (HOG) [7], [13], as a good descriptor of the local appearance and also the shape, has been exploited to expression analysis in recent years. In this paper, we first locate the accurate position of dense facial landmarks with face alignment method. After that, the high-dimensional feature is formed by concatenating all descriptors which extracted from patches centered around landmarks. Specifically, three different descriptors, which are HOG, LBP and gray value, are selected and evaluated in this paper.

Deep sparse autoencoders (DSAE), one of the deep learning models, have been extensively researched and widely applied to many fields [14], [37]. In particular, the DSAE is a deep neural network built by stacked sparse Autoencoders, and the Softmax classifier is generally selected as the output layer for classification problem [32], [36]. The highlight of DSAE is that it can extract useful features by unsupervised learning, that is, it only reserves crucial information of the data in robust and discriminative representations after detecting and removing input redundancies [26]. It should be mentioned that it is a challenging task to distinguish different emotions regardless of the identity of the face because of the individual variations for the same expression and the subtleties between expressions. In addition, the external factors increase the difficulty of the recognition process, in terms of illumination, environment and cameras. So as to conquer the challenges mentioned above, we intend to establish a DSAE-based deep learning framework for facial expression recognition to classify the expressions with high accuracy by learning the useful features from the data set.

The novelty and contribution of our work are primarily threefold. (1) A high-dimensional feature, which is the combination of the facial geometric and appearance features, is introduced to the facial expression recognition due to its containing the accurate and comprehensive information of emotions. (2) A DSAE-based deep learning framework is established for facial expression recognition with high accuracy by learning robust and discriminative features from the data set. (3) The presented DSAE-based approach is successfully applied to distinguish different facial expressions on the CK+ database. Note that our work focuses on the 7-class and 8-class (including the neural) facial expression recognition, which are more difficult to distinguish. The results showed that the DSAE-based approach outperforms other three state-of-the-art approaches for 7-class recognition by as much as 3.17, 4.09, and 7.41 respectively, and it also achieves a good performance with satisfactory accuracy for 8-class recognition.

The remainder of this paper is organized as follows. In Section 2, we present a detailed introduction on the sparse autoencoder, the deep sparse autoencoders, as well as the applications to the facial expression recognition. Section 3 mainly discusses the experiment results of facial expression recognition via the deep sparse autoencoders and also evaluates its overall performance by comparing with other three state-of-the-art approaches. Finally, conclusions are summarized in Section 4.

Section snippets

Deep sparse autoencoders for facial expression recognition

In this section, we mainly introduce the sparse autoencoder and the deep neural network formed by stacked sparse autoencoders, which can learn discriminative features to distinguish the facial expressions.

Database description

We utilized the extended Cohn–Kanade (CK+) database to evaluate the proposed framework of facial expression recognition in this paper. The CK+ database released in 2010 is the extension of the Cohn–Kanade (CK), which has become one of the most widely used benchmark databases for evaluating the recognition performance of algorithms [23]. Especially, the type of emotion states in CK+ is increased to eight categories and all labels of emotion are amended and validated for the purpose of improving

Conclusions

In this paper, we have presented a novel approach for facial expression recognition using deep sparse autoencoders (DSAE), which can automatically distinguish the expressions with high accuracy. Both the facial geometric and appearance features have been introduced to compose a high-dimensional feature with accurate and comprehensive information of emotions. Particularly, the DSAE-based deep learning framework has been established for facial expression recognition to identify the expressions

Acknowledgments

This work was supported in part by the UK–China Industry Academia Partnership Programme under grant UK-CIAPP-276, in part by the Korea Foundation for Advanced Studies, in part by the Natural Science Foundation of China under grant 61403319, in part by the Fujian Natural Science Foundation under grant 2015J05131, in part by the Fujian Provincial Key Laboratory of Eco-Industrial Green Technology.

Nianyin Zeng was born in Fujian Province, China, in 1986. He received the B.Eng. degree in electrical engineering and automation in 2008 and the Ph.D. degree in electrical engineering in 2013, both from Fuzhou University. From October 2012 to March 2013, he was a RA in the Department of Electrical and Electronic Engineering, the University of Hong Kong. Currently, he is an assistant professor with the Department of Instrumental & Electrical Engineering of Xiamen University. His current research

References (49)

  • J. Zhang et al.

    Passivity analysis for discrete-time neural networks with mixed time-delays and randomly occurring quantization effects

    Neurocomputing

    (2016)
  • W. Zhang et al.

    Event-based state estimation for a class of complex networks with time-varying delays: a comparison principle approach

    Phys. Lett. A

    (2017)
  • K. Anderson et al.

    A real-time automated system for the recognition of human facial expressions

    IEEE Trans. Syst. Man Cybern. Part B Cybern.

    (2006)
  • Y. Bengio et al.

    Greedy layer-wise training of deep networks

    Adv. Neural Inf. Process. Syst.

    (2007)
  • D. Chen et al.

    Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification

    2013 IEEE Conference on Computer Vision and Pattern Recognition

    (2013)
  • H. Chen et al.

    Pinning controllability of autonomous Boolean control networks

    Sci. China Inf. Sci.

    (2016)
  • T. Cootes et al.

    Active appearance models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2005)
  • A. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. R. Stat. Soc.

    (1977)
  • D. Ding et al.

    Event-based security control for discrete-time stochastic systems

    IET Control Theory Appl.

    (2016)
  • P. Ekman et al.

    Constants across cultures in the face and emotion

    J. Pers. Social Psychol.

    (1971)
  • T. Gritti et al.

    Local features based facial expression recognition with face registration errors

    The 8th IEEE International Conference on Automatic Face and Gesture Recognition

    (2008)
  • G. Hinton et al.

    Reducing the dimensionality of data with neural networks

    Science

    (2006)
  • X. Huang et al.

    Spatiotemporal local monogenic binary patterns for facial expression recognition

    IEEE Signal Process. Lett.

    (2012)
  • Cited by (444)

    View all citing articles on Scopus

    Nianyin Zeng was born in Fujian Province, China, in 1986. He received the B.Eng. degree in electrical engineering and automation in 2008 and the Ph.D. degree in electrical engineering in 2013, both from Fuzhou University. From October 2012 to March 2013, he was a RA in the Department of Electrical and Electronic Engineering, the University of Hong Kong. Currently, he is an assistant professor with the Department of Instrumental & Electrical Engineering of Xiamen University. His current research interests include intelligent data analysis, computational intelligent, time-series modeling and applications. He is the author or co-author of several technical papers and also a very active reviewer for many international journals and conferences. Dr. Zeng is currently serving as an associate editor for Neurocomputing, and also an editorial board member for Biomedical Engineering Online (Springer), Journal of Advances in Biomedical Engineering and Technology, and Smart Healthcare.

    Hong Zhang received her bachelor’s degree in electrical engineering and automation from the Department of Mechanical & Electrical Engineering, Xiamen University, Xiamen, China, in 2015. She is currently pursuing the master’s degree in electrical testing technology and instruments at Xiamen University, Xiamen, China. Her research interests include image processing and deep learning techniques.

    Baoye Song received the B.S. degree in automation in 2005, the M.S. degree in control theory and control engineering in 2008 both from Qingdao University of Science and Technology, Qingdao, China, and the Ph.D. degree in control theory and control engineering in 2011 from Shandong University, Jinan, China. He has been with Shandong University of Science and Technology as a lecturer since 2011. His research interests include nonlinear filtering, wireless sensor network, mobile robot and fault diagnosis.

    Weibo Liu received his B.S. degree in electrical engineering from the Department of Electrical Engineering & Electronics, University of Liverpool, Liverpool, UK, in 2015. He is currently pursuing the Ph.D. degree in computer science at Brunel University London, London, UK. His research interests include big data analysis and deep learning techniques.

    Yurong Li was born in Fujian Province, China, in 1973. She received her master’s degree in industry automation and Ph.D. in control theory and control engineering from Zhejiang University, Zhejiang, China in 1997 and 2001, respectively. Now she is a professor at Fuzhou University. And since 2007, she is the member of Fujian Key Laboratory of Medical Instrumentation & Pharmaceutical Technology. Her research interests include biomedical instrument and intelligent information processing.

    Abdullah M. Dobaie received his B.Sc. in 1981 and M.Sc. in 1989, both in electronic and communication engineering from King Abdulaziz University in Saudi Arabia, and the Ph.D. degree in 1995 from Colorado State University in USA. He is the supervisor of many masters in science and has directed many projects concerning communication, digital filters, antenna and digital signal processing. His recent interests include adaptive communication systems, digital image processing, wave propagation and communication networks.

    View full text