Spontaneous facial expression recognition: A robust metric learning approach
Introduction
Human emotion recognition has long been an actively researched topic in Human Computer Interaction (HCI). Unlike other types of non-verbal communication, the human face is expressive and closely tied to an emotional state. The ability to interpret facial gestures is a key to a wide range of HCI applications. Researchers have achieved tremendous success in recognizing prototypical and posed facial expressions that are collected under tightly controlled conditions [1], [2], [3]. Since the most useful current and future face related applications lie in a more natural context, it is our goal in this paper to develop a system that can operate on spontaneous expressions characterizing the natural interaction between humans and computers.
Quite a few studies have been done on spontaneous facial expression recognition [4], [5], [6], but with only limited progress. There are several factors affecting the recognition accuracy of spontaneous expressions, including facial feature representation, classifier design, useful contextual cues, etc. This paper focuses on two issues that are still under-addressed in this field.
First of all, spontaneous facial expressions tend to have overlapping geometric and appearance features, making it difficult to find effective classification boundaries [6]. The second issue, most often ignored, has to do with noisy labeling. Traditional supervised classification methods assume perfect data labels. However, in the case of spontaneous facial expression recognition, which involves only slight facial muscle actions, the class labels can be erroneously assigned due to the subjectivity or varied expertise of the annotators [7]. Classifiers trained on such data inevitably have their performance negatively affected.
In this paper, we present an automatic recognition system for spontaneous facial expressions. In particular, we make the following contributions.
First, we formulate spontaneous facial expression recognition as a maximum likelihood based metric learning problem. Under the learned distance metric, spatially close (distant) data points have a higher probability of being in the same class, thus facilitating the kNN based classification.
Second, we address the problem of noisy labeling via multi-annotation and reliability estimation. In particular, to increase robustness to noisy labels, for each data point, multiple labels from different annotators are collected. The sensitivity and specificity of each annotator, which indicates the annotation reliability, and the distance metric is jointly estimated under the Expectation Maximization (EM) framework via an efficient online learning algorithm.
Third, we extensively compare our method with other methods. Experiments show that our method not only performs significantly better in recognizing spontaneous expressions, but also generalizes well to posed expressions.
The rest of this paper is structured as follows. In Section 2, a brief review of related work is given. In Section 3, the problem setting is described. Section 4 describes the feature representation of a facial expression. We formulate the problem of Robust Metric Learning based expression recognition and give an efficient solution in Section 5. Experimental results are given in Section 6.
Section snippets
Related work
Facial expression recognition methods are usually concerned with 7 basic expressions (including neutral) as defined in [8], and may be broadly classified as static or dynamic. Static approaches classify an expression in a single static image without considering the contextual information implied by adjacent images of a sequence. Representative methods are Naive Bayesian [9], SVM [1], Adaboost [10], etc. In contrast, a dynamic approach, e.g. HHM [11], CRF [12], exploits the dependency between
Problem setting and overview of our system
In this paper, we focus on the recognition of subtle facial expressions that are spontaneously produced. To this end, the Moving Faces and People (MFP) dataset [32] is used in our work. Here we briefly introduce MFP and describe how we adapt this dataset to suit our needs.
MFP is a large-scale database of static images and video clips of human faces and people. The major difference between this dataset and popular posed expression datasets (e.g. CK+ [31], MMI [33], JAFFE [34]) is that it is
Facial feature extraction
We use a fusion of face shape and texture to represent a facial expression, as shown in Fig. 4. This hybrid representation is able to incorporate local pixel intensity variation pattern while still adhering to shape constraint at a global level.
Robust metric learning for spontaneous facial expressions
In this section, we first formulate the problem of spontaneous facial expression recognition using a Robust Metric Learning approach, then give an efficient solution to the resulting optimization problem via Expectation Maximization.
Experiments
After learning the robust distance metric, given a novel facial expression, its class label can be identified by first retrieving the training examples that are predicted to be in the same class as the novel expression, then performing majority voting using the actual expression labels of these training examples. Two groups of experiments are conducted in our study. In the first group, Robust Metric Learning is extensively evaluated against a number of the state-of-the-art methods in terms of
Conclusion
In this work, we propose a Robust Metric Learning method for spontaneous facial expression recognition. In contrast to traditional supervised classification methods, we explicitly take into account the potential label errors when designing our method. In particular, we collect subjective (possibly erroneous) labels from multiple annotators. In practice, there is a substantial amount of disagreement among the annotators. The proposed Expectation Maximization based framework iteratively
Conflict of interest
None declared.
Shaohua Wan received the Bachelor of Science Degree from Beijing University of Posts and Telecommunications in 2011. He entered the University of Texas at Austin in Fall 2011. His research interests include facial expression analysis, similarity based search, metric learning.
References (46)
- et al.
Texture and shape information fusion for facial expression and facial action unit recognition
Pattern Recognit.
(2008) - et al.
Salient feature and reliable classifier selection for facial expression classification
Pattern Recognit.
(2010) - et al.
Hybrid-boost learning for multi-pose face detection and facial expression recognition
Pattern Recognit.
(2008) - et al.
Recognition of facial expressions using Gabor wavelets and learning vector quantization
Eng. Appl. Artif. Intell.
(2008) - M.F. Valstar, M. Pantic, Z. Ambadar, J.F. Cohn, Spontaneous vs. posed facial behavior: automatic analysis of brow...
- et al.
The timing of facial motion in posed and spontaneous smiles
Int. J. Wavel., Multiresolut. Inf. Process.
(2004) - S. Wan, J.K. Aggarwal, A scalable metric learning-based voting method for expression recognition, in: 10th IEEE...
Judgments of emotion from spontaneous facial expressions of new guineans
Emotions
(2007)- et al.
Unmasking the Facea Guide to Recognizing Emotions from Facial Clues
(1975) - X. Sun, L. Rothkrantz, D. Datcu, P. Wiggers, A Bayesian approach to recognise facial expressions using vector flows,...
Articlean HMM based model for prediction of emotional composition of a facial expression using both significant and insignificant action units and associated gender differences
Int. J. Comput. Appl.
Automatic analysis of facial expressionsthe state of the art
IEEE Trans. Pattern Anal. Mach. Intell.
A survey of affect recognition methodsaudio, visual, and spontaneous expressions
IEEE Trans. Pattern Anal. Mach. Intell.
Facial action recognition for facial expression analysis from static face images
IEEE Trans. Syst., Man, Cybern., Part BCybern.
Machine analysis of facial expressions
Distance metric learning for large margin nearest neighbor classification
J. Mach. Learn. Res.
Cited by (71)
CalD3r and MenD3s: Spontaneous 3D facial expression databases
2024, Journal of Visual Communication and Image RepresentationPSU-CNN: Prediction of student understanding in the classroom through student facial images using convolutional neural network
2022, Materials Today: ProceedingsCross-dataset emotion recognition from facial expressions through convolutional neural networks
2022, Journal of Visual Communication and Image RepresentationFacial expressions classification and false label reduction using LDA and threefold SVM
2020, Pattern Recognition LettersData-Driven Diagnostics and the Potential of Mobile Artificial Intelligence for Digital Therapeutic Phenotyping in Computational Psychiatry
2020, Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
Shaohua Wan received the Bachelor of Science Degree from Beijing University of Posts and Telecommunications in 2011. He entered the University of Texas at Austin in Fall 2011. His research interests include facial expression analysis, similarity based search, metric learning.
J.K. Aggarwal is on the faculty of The University of Texas at Austin College of Engineering and is currently a Cullen Professor of Electrical and Computer Engineering and Director of the Computer and Vision Research Center. His research interests include computer vision, pattern recognition focusing on human motion.
- 1
Tel.: +1 512 471 1369.