Elsevier

Pattern Recognition

Volume 47, Issue 5, May 2014, Pages 1859-1868
Pattern Recognition

Spontaneous facial expression recognition: A robust metric learning approach

https://doi.org/10.1016/j.patcog.2013.11.025Get rights and content

Highlights

  • A novel spontaneous expression recognition is proposed.

  • It utilizes multiple annotators for expression labeling.

  • Robustness to annotation errors is ensured via estimation of error probability.

  • Expectation Maximization is used for learning the distance metric.

  • A high recognition accuracy is achieved via the robust metric learning.

Abstract

Spontaneous facial expression recognition is significantly more challenging than recognizing posed ones. We focus on two issues that are still under-addressed in this area. First, due to the inherent subtlety, the geometric and appearance features of spontaneous expressions tend to overlap with each other, making it hard for classifiers to find effective separation boundaries. Second, the training set usually contains dubious class labels which can hurt the recognition performance if no countermeasure is taken. In this paper, we propose a spontaneous expression recognition method based on robust metric learning with the aim of alleviating these two problems. In particular, to increase the discrimination of different facial expressions, we learn a new metric space in which spatially close data points have a higher probability of being in the same class. In addition, instead of using the noisy labels directly for metric learning, we define sensitivity and specificity to characterize the annotation reliability of each annotator. Then the distance metric and annotators' reliability is jointly estimated by maximizing the likelihood of the observed class labels. With the introduction of latent variables representing the true class labels, the distance metric and annotators' reliability can be iteratively solved under the Expectation Maximization framework. Comparative experiments show that our method achieves better recognition accuracy on spontaneous expression recognition, and the learned metric can be reliably transferred to recognize posed expressions.

Introduction

Human emotion recognition has long been an actively researched topic in Human Computer Interaction (HCI). Unlike other types of non-verbal communication, the human face is expressive and closely tied to an emotional state. The ability to interpret facial gestures is a key to a wide range of HCI applications. Researchers have achieved tremendous success in recognizing prototypical and posed facial expressions that are collected under tightly controlled conditions [1], [2], [3]. Since the most useful current and future face related applications lie in a more natural context, it is our goal in this paper to develop a system that can operate on spontaneous expressions characterizing the natural interaction between humans and computers.

Quite a few studies have been done on spontaneous facial expression recognition [4], [5], [6], but with only limited progress. There are several factors affecting the recognition accuracy of spontaneous expressions, including facial feature representation, classifier design, useful contextual cues, etc. This paper focuses on two issues that are still under-addressed in this field.

First of all, spontaneous facial expressions tend to have overlapping geometric and appearance features, making it difficult to find effective classification boundaries [6]. The second issue, most often ignored, has to do with noisy labeling. Traditional supervised classification methods assume perfect data labels. However, in the case of spontaneous facial expression recognition, which involves only slight facial muscle actions, the class labels can be erroneously assigned due to the subjectivity or varied expertise of the annotators [7]. Classifiers trained on such data inevitably have their performance negatively affected.

In this paper, we present an automatic recognition system for spontaneous facial expressions. In particular, we make the following contributions.

First, we formulate spontaneous facial expression recognition as a maximum likelihood based metric learning problem. Under the learned distance metric, spatially close (distant) data points have a higher probability of being in the same class, thus facilitating the kNN based classification.

Second, we address the problem of noisy labeling via multi-annotation and reliability estimation. In particular, to increase robustness to noisy labels, for each data point, multiple labels from different annotators are collected. The sensitivity and specificity of each annotator, which indicates the annotation reliability, and the distance metric is jointly estimated under the Expectation Maximization (EM) framework via an efficient online learning algorithm.

Third, we extensively compare our method with other methods. Experiments show that our method not only performs significantly better in recognizing spontaneous expressions, but also generalizes well to posed expressions.

The rest of this paper is structured as follows. In Section 2, a brief review of related work is given. In Section 3, the problem setting is described. Section 4 describes the feature representation of a facial expression. We formulate the problem of Robust Metric Learning based expression recognition and give an efficient solution in Section 5. Experimental results are given in Section 6.

Section snippets

Related work

Facial expression recognition methods are usually concerned with 7 basic expressions (including neutral) as defined in [8], and may be broadly classified as static or dynamic. Static approaches classify an expression in a single static image without considering the contextual information implied by adjacent images of a sequence. Representative methods are Naive Bayesian [9], SVM [1], Adaboost [10], etc. In contrast, a dynamic approach, e.g. HHM [11], CRF [12], exploits the dependency between

Problem setting and overview of our system

In this paper, we focus on the recognition of subtle facial expressions that are spontaneously produced. To this end, the Moving Faces and People (MFP) dataset [32] is used in our work. Here we briefly introduce MFP and describe how we adapt this dataset to suit our needs.

MFP is a large-scale database of static images and video clips of human faces and people. The major difference between this dataset and popular posed expression datasets (e.g. CK+ [31], MMI [33], JAFFE [34]) is that it is

Facial feature extraction

We use a fusion of face shape and texture to represent a facial expression, as shown in Fig. 4. This hybrid representation is able to incorporate local pixel intensity variation pattern while still adhering to shape constraint at a global level.

Robust metric learning for spontaneous facial expressions

In this section, we first formulate the problem of spontaneous facial expression recognition using a Robust Metric Learning approach, then give an efficient solution to the resulting optimization problem via Expectation Maximization.

Experiments

After learning the robust distance metric, given a novel facial expression, its class label can be identified by first retrieving the training examples that are predicted to be in the same class as the novel expression, then performing majority voting using the actual expression labels of these training examples. Two groups of experiments are conducted in our study. In the first group, Robust Metric Learning is extensively evaluated against a number of the state-of-the-art methods in terms of

Conclusion

In this work, we propose a Robust Metric Learning method for spontaneous facial expression recognition. In contrast to traditional supervised classification methods, we explicitly take into account the potential label errors when designing our method. In particular, we collect subjective (possibly erroneous) labels from multiple annotators. In practice, there is a substantial amount of disagreement among the annotators. The proposed Expectation Maximization based framework iteratively

Conflict of interest

None declared.

Shaohua Wan received the Bachelor of Science Degree from Beijing University of Posts and Telecommunications in 2011. He entered the University of Texas at Austin in Fall 2011. His research interests include facial expression analysis, similarity based search, metric learning.

References (46)

  • Y. Wang, H. Ai, B. Wu, C. Huang, Real time facial expression recognition with adaboost, in: Proceedings of the 17th...
  • S. Das et al.

    Articlean HMM based model for prediction of emotional composition of a facial expression using both significant and insignificant action units and associated gender differences

    Int. J. Comput. Appl.

    (2012)
  • S. Jain, C. Hu, J.K. Aggarwal, Facial expression recognition with temporal modeling of shapes, in: 2011 IEEE...
  • M. Pantic et al.

    Automatic analysis of facial expressionsthe state of the art

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • S. Park, D. Kim, Spontaneous facial expression classification with facial motion vectors, in: IEEE International...
  • Z. Zeng et al.

    A survey of affect recognition methodsaudio, visual, and spontaneous expressions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • M. Pantic et al.

    Facial action recognition for facial expression analysis from static face images

    IEEE Trans. Syst., Man, Cybern., Part BCybern.

    (2004)
  • M. Pantic et al.

    Machine analysis of facial expressions

  • S.M. Lajevardi, M. Lech, Averaged Gabor filter features for facial expression recognition, in: Proceedings of the 2008...
  • J. Whitehill, C. Omlin, Haar features for FACS AU recognition, in: 7th International Conference on Automatic Face and...
  • S.Z. Li, A.K. Jain (Eds.), Facial Expression Analysis: Handbook of Face Recognition, 2nd edition, Springer,...
  • E. Xing, A. Ng, M. Jordan, S. Russell, Distance metric learning, with application to clustering with side-information,...
  • K.Q. Weinberger et al.

    Distance metric learning for large margin nearest neighbor classification

    J. Mach. Learn. Res.

    (2009)
  • Cited by (71)

    • CalD3r and MenD3s: Spontaneous 3D facial expression databases

      2024, Journal of Visual Communication and Image Representation
    View all citing articles on Scopus

    Shaohua Wan received the Bachelor of Science Degree from Beijing University of Posts and Telecommunications in 2011. He entered the University of Texas at Austin in Fall 2011. His research interests include facial expression analysis, similarity based search, metric learning.

    J.K. Aggarwal is on the faculty of The University of Texas at Austin College of Engineering and is currently a Cullen Professor of Electrical and Computer Engineering and Director of the Computer and Vision Research Center. His research interests include computer vision, pattern recognition focusing on human motion.

    1

    Tel.: +1 512 471 1369.

    View full text