Elsevier

Neurocomputing

Volume 82, 1 April 2012, Pages 99-108
Neurocomputing

A remarkable standard for estimating the performance of 3D facial expression features

https://doi.org/10.1016/j.neucom.2011.10.029Get rights and content

Abstract

All the previous work on 3D facial expression recognition is always based on different feature extraction algorithms and different classifiers, so there is no uniform standard for us to identify which features are the “best”. This paper investigates KL divergence (relative entropy) for discrimination power computation to determine the “best” features in this field. From experiments, we can conclude that local facial expression features in flow-matrix form are more beneficial to 3D facial expression recognition than in geometry-matrix form; the feature points in local expression regions can be more discriminative than points in face contour; and the slope and angle features are more powerful than distance features. Above all, this paper verifies that the KL divergence can definitely be considered as the standard for determining the “best” features to recognize 3D facial expressions. This is the first exploration on BU-3DFE (Binghamton University 3D Facial Expression) database to find a standard for evaluating the extracted facial expression features, and all of these results are remarkable for 3D facial expression feature extraction.

Introduction

Facial expression recognition is a challenging field in numerous psychology researches. People's mental activities can be identified by their facial expressions, thus it is significant in security systems, entertainment places, etc. Also the researches on facial expression recognition will open a broad new way for changing traditional human–machine interactive mode. All these requirements inspired a number of researchers dedicated to facial expression recognition.

Facial expression recognition has been researched for decades. Traditionally, it is based on 2D static images [1], [2], [3], [4], [5] or 2D video sequences [6], [7], [8] with a wide variety of features, (e.g. LBP [1], image ratio feature [2], Gabor wavelets [4], etc.), using previous approaches (e.g. LPP [5], HMM [6], [7], etc.). It is impossible to name all of them here (for further discussion, one can refer to the excellent survey [9]). However, the common theme in these current researches is that facial models are treated as flat patterns, and thus expression variations are considered on the picture plane. Therefore, it is hard to detect subtle in-depth skin motions, which could significantly affect the performance of facial expression recognition [10]. Moreover, large variations of head pose and illuminations deteriorate the effectiveness of facial features tracking in 2D facial expression recognition. Expression analysis on the 3D facial surface can overcome this challenging issue and decrease the influence of large pose variations.

The facial expressions are closely related to the shape changes of the whole 3D facial surface, and the previous literature has pointed out the distinctive advantages of expression processing in multi-dimensional space [11]. Some recent work [12], [13], [14] has expressed that the approaches based on 3D facial models can perform better than those based on 2D images. The work in [15] provided a positive answer to construct multimodal probabilistic graphical models for tensor format data, which bridged the relationship between tensor descriptor and the vector-based feature representation. All these exploratory researches enable us to target facial expression recognition in 3D space.

To foster 3D facial expression research, Binghamton University created a 3D facial expression database, named BU-3DFE database [10], which included 100 subjects with 2500 facial models. The research has made great improvements recently, and the following list illustrates the most related work based on BU-3DFE database:

  • Wang et al. [16] are pioneers in 3D facial expression recognition using BU-3DFE database. They constructed seven local expression regions based on 64 primary points and extracted 12 primitive facial features from each of the local regions and with the LDA classifier, the highest recognition rate was higher than 80%.

  • Soyel and Demirel [17] extracted six distance vectors, which were based on the facial shape information for facial expression recognition. Then they got the average facial expression recognition rate up to 91.3% with a previously trained neural network. Meanwhile in [18], they used another six distance vectors and the average recognition rate of 87.8% was reached.

  • Tang and Huang [19] worked for facial expression recognition based on a regularized multi-class AdaBoost classification. The selected features were composed of the fusion of 24 normalized Euclidean distances and they achieved the average recognition rate of 95.1%. In addition, they extracted a set of 96 distinguishing features to describe the properties of line segments for facial expression recognition in [20].

  • Venkatesh et al. [21] proposed the modified PCA to extract the discriminative features on a limited set of feature points. Moreover, they only employed spectral flow matrices as features to recognize facial expressions based on the uniformly sampled 3D matrix structure [22]. The proposed method achieved the average recognition rate of 85.56% based on the k-means clustering algorithm, but the uniformly sampled 3D matrix structure is remarkable.

  • Tekguc et al. [12] used NSGA II to determine the optimal set of facial features from the entire feature space, which was produced by normalized distance vectors based on BU-3DFE database. The average classification rate of the proposed strategy reached up to 88.18%, which was better than the 2D methods, such as Gabor Wavelet and Topographic Context Method, and the method proposed in [16].

  • With the properties of high relevance and low redundancy, NCBF [23] was introduced to extract the facial expression features, then PCA was performed on the selected features to obtain the most discriminative information. This two-step approach outperformed the conventional methods.

According to the previous work, recent researches have mainly focused on solving the challenging issues of BU-3DFE database. However, with the different classifiers for facial expression recognition, there is no uniform standard that can be used to determine whether the extracted features better performed than other features. An attractive scheme is to look for an automatic algorithm, which is able to identify the “best” features from a large feature pool.

To select the “best” features, this paper calls for an automatic algorithm based on the Kullback–Leibler divergence (KLD) (or named relative entropy), which is able to identify whether a feature is the “best”. The metric of relative entropy was already used in [24], [25] to measure distances between models.

According to the previous work both on 3D facial expression recognition and KL divergence applications, this paper extracts different sets of facial expression features from BU-3DFE database detailed in Section 2. Kullback–Leibler divergences as the discrimination power for all kinds of the features are computed in Section 3. Detailed experimental results are described in Section 4. Section 5 and Section 6 present the discussion and concluding remarks, respectively.

Section snippets

Candidate feature pool generation

In BU-3DFE database, 83 feature vertices were marked on each cropped facial model. Given the set of the labeled feature points, the feature regions on the face surface can be easily detected, and these features could be used as baseline for performance evaluation of the algorithms for 3D facial expression analysis. Their distribution and the corresponding numbers are presented in Fig. 1 using VrmlPad.1

Discrimination power computations

The discriminant characteristics are directly related to the recognition performance. Generally, if a feature preserves more discrimination power, it will lead to better recognition performance. From the perspective of Bayesian analysis, discrimination power of the features depends on the heterogeneous conditional probability distribution [19]. If the probability value is large, the feature preserves more discrimination. In this paper, an excellent metric, KLD, is introduced to evaluate the

Experimental environment and database description

Our experimental environment is Intel(R), Pentium(R) 4 CPU, 3.0 GHz basic frequency and 2GB RAM. All these following experiments are based on BU-3DFE [10] database.

BU-3DFE database is the 3D facial expression database built by Binghamton University in 2006. There are a total of 100 subjects (56 are female and 44 are male) in the database with a variety of ethnic ancestries and ages. Each subject performed neutral and six expressions (angry, disgust, fear, happy, sad, and surprise). Apart from

Discussions

According to the results in 4.2 DP comparisons of different facial expression features, 4.3 Verifications based on the PNN architecture, 4.4 Performance evaluation using subspace selection methods, we can discover that

  • (1)

    Seen from the comparative discrimination power of Feature 1 and Feature 3 in Fig. 2, the facial features in flow-matrix form are of more discrimination power than those in geometry-matrix form. However, the discrimination power of Feature 2 in both flow-matrix form and

Conclusions

This paper investigates KL divergence (relative entropy) to measure the discrimination power based on different sets of facial expression features including distance features, slope features, angle features, VTC, and their fusion. Then the PNN architecture is conducted to verify this strategy, and the subspaces selection methods, FLDA and MGMD with Euclidean distance and Mahalanobis distance, are used for performance evaluation. According to the results in Section 5, we can conclude that

  • (1)

    The

Acknowledgments

This work is supported by National Natural Science Foundation (60973060), the Research Fund for the Doctoral Program (200800040008) and the Fundamental Research Funds for the Central Universities (2009YJS025).

Xiaoli Li received her B.S. degree in Computer and Information Technology from Beijing Jiaotong University, China in 2008. Currently she is a Ph.D. candidate in Signal and Information Processing at Beijing Jiaotong University, China. Her research is devoted to image processing, pattern recognition, computer vision, etc.

References (32)

  • Y.L. Tian et al.

    Facial expression analysis

  • L. Yin, X. Wei, Y. Sun, J. Wang, M.J. Rosato, A 3D facial expression database for facial behavior research, in:...
  • J. Russell

    Is there universal recognition of emotion from facial expression?

    Psychol. Bull.

    (1994)
  • U. Tekguc, H. Soyel, H. Demirel, Feature selection for person-independent 3D facial expression recognition using...
  • L. Zalewski, S. Gong, Synthesis and recognition of facial expressions in virtual 3d views, in: Proceedings of the Sixth...
  • Z. Wen, T.S. Huang, Capturing subtle facial motions in 3d face tracking, in: Proceedings of the Ninth IEEE...
  • Cited by (5)

    • Robust regional bounding spherical descriptor for 3D face recognition and emotion analysis

      2015, Image and Vision Computing
      Citation Excerpt :

      Based on the above discussion, our method effectively balances the simple implementation and high accuracy with computational efficiency. Referring to the feature sets designed in recent publications [30], a series of experiments detailed in Table 7 are implemented to validate expression classification. All of them are repeated for 100 times to lower the randomness and randomly selected 50 face models from each expression class each time.

    • Head and Neck: Morphology, Models and Function

      2018, Head and Neck: Morphology, Models and Function

    Xiaoli Li received her B.S. degree in Computer and Information Technology from Beijing Jiaotong University, China in 2008. Currently she is a Ph.D. candidate in Signal and Information Processing at Beijing Jiaotong University, China. Her research is devoted to image processing, pattern recognition, computer vision, etc.

    Qiuqi Ruan, Professor and Ph.D. Supervisor, was born in 1944. He received his B.S. and M.S. degrees from Northern Jiaotong University, PR China in 1969 and 1981, respectively. From January 1987 to May 1990, he was a visiting scholar in the University of Pittsburgh, and the University of Cincinnati. Subsequently, he has been a visiting professor in USA for several times. He is the author and co-author of over 350 technical papers and edited 8 books in the image processing and information science. He is the member of Appraise Discipline Group of the State Council Degree Committee, senior member of IEEE, Chairman of Technology Committee of IEEE Beijing Branch, etc. His main research interests include digital signal processing, computer vision, pattern recognition, virtual reality, etc.

    Yue Ming is currently pursuing her Ph.D. degree at the Institute of Information Science, Beijing Jiaotong University. Her research interests include image processing, pattern recognition, 3D face recognition and reconstruction, etc.

    View full text