Elsevier

Pattern Recognition

Volume 52, April 2016, Pages 174-185
Pattern Recognition

An effective methodology for dynamic 3D facial expression retrieval

https://doi.org/10.1016/j.patcog.2015.10.012Get rights and content

Highlights

  • We illustrate a novel retrieval methodology for dynamic sequences of 3D face scans.

  • We present a detailed evaluation of the introduced retrieval scheme.

  • BU-4DFE and BP4D-Spontaneous data sets were used for experiments.

  • Retrieval results are used to achieve unsupervised facial expression recognition.

  • The presented results outperform state-of-the-art retrieval schemes.

Abstract

The problem of facial expression recognition in dynamic sequences of 3D face scans has received a significant amount of attention in the recent past whereas the problem of retrieval in this type of data has not. A novel retrieval methodology for such data is introduced in this paper. The proposed methodology automatically detects specific facial landmarks and uses them to create a descriptor. This descriptor is the concatenation of three sub-descriptors which capture topological as well as geometric information of the 3D face scans. The motivation behind the proposed hybrid facial expression descriptor is the fact that some facial expressions, like happiness and surprise, are characterized by obvious changes in the mouth topology while others, like anger, fear and sadness, produce geometric but no significant topological changes. The proposed retrieval scheme exploits the Dynamic Time Warping technique in order to compare descriptors corresponding to different 3D facial sequences. A detailed evaluation of the introduced retrieval scheme is presented showing that it outperforms previous state-of-the-art retrieval schemes. Experiments have been conducted using the six prototypical expressions of the standard dataset BU-4DFE and the eight prototypical expressions of the recently available dataset BP4D-Spontaneous. Finally, a majority voting scheme based on the retrieval results is used to achieve unsupervised dynamic 3D facial expression recognition. The achieved classification accuracy is comparable to the state-of-the-art supervised dynamic 3D facial expression recognition techniques.

Introduction

Facial expressions are generated by facial muscle movements, resulting in temporary deformation of the face. In recent years, automatic analysis of facial expressions has emerged as an active research area due to its various applications such as human–computer interaction, human behavior understanding, biometrics, emotion recognition, computer graphics, driver fatigue detection, and psychology. Ekman [1] was the first to systematically study human facial expressions. His study categorizes the prototypical facial expressions, apart from neutral expression, into six classes representing anger, disgust, fear, happiness, sadness and surprise. This categorization is consistent across different ethnicities and cultures. Furthermore, each of the six aforementioned expressions is mapped to specific movements of facial muscles, called Action Units (AUs). This led to the Facial Action Coding System (FACS), where facial changes are described in terms of AUs.

The recent availability of 4D data1 has increased research interest in the field. The first dataset that consists of 4D facial data was BU-4DFE, presented by Yin et al. [2]. BU-4DFE was created at the University of New York at Binghamton and was made available in 2008. It involves 101 subjects (58 females and 43 males) of various ethnicities. For each subject the six basic expressions were recorded. Yin et al. [3], [4] also presented the BP4D-Spontaneous dataset in 2013 to the research community. This dataset contains high-resolution spontaneous 3D dynamic facial expressions. It involves 41 subjects (23 females and 18 males) of various ethnicities. Each of the aforementioned datasets are accompanied by a number of facial landmarks marked on each 3D frame. Table 1 illustrates the existing 4D facial expression datasets. Finally, the Hi4D-ADSIP dataset was presented by Matuszewski et al. [5]. The dataset was created at University of Central Lancashire and is not publicly available yet. It contains 80 subjects (48 females and 32 males) of various age and ethnic origins. Each subject was recorded for seven basic expressions (anger, disgust, fear, happiness, sadness, surprise and pain).

A lot of research has been dedicated to address the problem of facial expression recognition in dynamic sequences of 3D face scans. On the other hand, to the best of our knowledge, insufficient research on facial expression retrieval using dynamic 3D face scans appears in the bibliography. The motivation behind the proposed retrieval scheme descriptor is the following: Some facial expressions, like happiness and surprise, are characterized by obvious changes of the mouth topology. These expressions can be easily retrieved by using a topological descriptor. On the other hand, there are facial expressions, like, anger, fear and sadness, where there are geometric but no significant topological changes. For these cases, a geometric descriptor should be used in order to complement the topological descriptor.

In the present work, a novel dynamic 3D facial expression retrieval scheme is proposed. For the creation of the scheme׳s descriptor, we use 3D facial landmarks. Instead of using the landmarks provided by the dynamic 3D facial expression dataset, we use an algorithm for the automatic detection of the desired facial landmarks along time frames. The result is a hybrid descriptor capturing topological as well as geometric information of the 3D face scans. The Dynamic Time Warping technique is used in order to compare descriptors corresponding to different 3D facial sequences.

Furthermore, majority voting is applied on the retrieval results in order to perform unsupervised 4D facial expression recognition. In this way, our descriptor can be compared to the state-of-the-art descriptors used for recognition. The achieved classification accuracy is comparable to the supervised 4D facial expression recognition state-of-the-art techniques. Finally, experimental results on the area of 4D facial expression retrieval are illustrated. Experiments have been implemented using the six prototypical expressions of the standard dataset BU-4DFE and, for the first time for facial expression retrieval or recognition purposes, the eight prototypical expressions of the recently available BP4D-Spontaneous dataset. The proposed retrieval methodology outperforms the methodologies of the state-of-the-art.

The remainder of the paper is organized as follows. In Section 2, previous works on the field of 4D facial expression recognition are reviewed. In Section 3, the proposed retrieval scheme is explicitly described and illustrated. In Section 4, the experimental results of the new retrieval scheme are presented and discussed. Finally, conclusions are drawn in Section 5.

Section snippets

Related work

Due to insufficient previous work in 4D facial expression retrieval, the current section deals with expression recognition. However, we concentrate on the descriptors and the 4D representation used in expression recognition, which are also related to the retrieval process. A detailed survey on 4D video facial expression recognition methodologies is presented in [6]. In this survey, methodologies are reviewed and categorized based on the dynamic face analysis approach that they use. Dynamic face

Methodology

The proposed methodology consists of three steps. First, landmarks are automatically extracted from each 3D frame of a 3D facial expression sequence. Second, a descriptor is created from topological and geometric information arising from the extracted landmarks. Third, the Dynamic Time Warping technique is used to enable the comparison of different descriptors. The pipeline of the new retrieval scheme is illustrated in Fig. 1.

As discussed in Section 2, the majority of 4D facial expression

Experimental results

The datasets we used to conduct our experiments are BU-4DFE and BP4D-Spontaneous. It is important to mention that BP4D-Spontaneous dataset is used here for the first time for facial expression retrieval/recognition; so far, it has only been used for AU recognition.

The BU-4DFE dataset was made by Yin et al. [2]. It is the first dataset consisting of faces recorded in 3D video and involves 101 subjects (58 females and 43 males) of various ethnicities. For each subject, the six basic expressions

Conclusions

Dynamic 3D facial expression analysis constitutes a crucial open research field due to its applications in human–computer interaction, psychology, biometrics, etc. In this paper, a new scheme for dynamic 3D facial expression retrieval is presented. The new scheme automatically detects facial landmarks on 3D scans and uses them to create the GeoTopo+ descriptor. The GeoTopo+ descriptor captures both topological and geometric information of 3D face scans. Experiments have been conducted on the

Conflict of interest

None declared.

Antonios Danelakis received his B.Sc. from the Department of Informatics and Telecommunications of the National and Kapodistrian University of Athens in 2008. Subsequently, he received his first M.Sc. from the same Department in the field of Computational Science in 2010 and his second M.SC. in the field of Medical Informatics in 2012. He is currently a Ph.D. student in the Department of Informatics and Telecommunications at the National and Kapodistrian University of Athens. His research

References (26)

  • A. Danelakis et al.

    A survey on facial expression recognition in 3D video sequences

    Multimed. Tools Appl.

    (2014)
  • A. Danelakis, T. Theoharis, I. Pratikakis, Geotopo: dynamic 3D facial expression retrieval using topological and...
  • P. Perakis et al.

    3D facial landmark detection under large yaw and expression variations

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • Cited by (0)

    Antonios Danelakis received his B.Sc. from the Department of Informatics and Telecommunications of the National and Kapodistrian University of Athens in 2008. Subsequently, he received his first M.Sc. from the same Department in the field of Computational Science in 2010 and his second M.SC. in the field of Medical Informatics in 2012. He is currently a Ph.D. student in the Department of Informatics and Telecommunications at the National and Kapodistrian University of Athens. His research interests lie in the areas of 3D Video Facial Expressions Retrieval, Computer Graphics, Visualization and Medical Informatics.

    Theoharis Theoharis received his D.Phil. in computer graphics and parallel processing from the University of Oxford, U.K., in 1988. He subsequently served as a research fellow at the University of Cambridge, a Professor at the University of Athens and NTNU, Norway. His main research interests lie in the fields of Biometrics, 3D Object Retrieval and Reconstruction. He is the author of a number of textbooks, including Graphics and Visualization: principles and algorithms.

    Ioannis Pratikakis is an Assistant Professor at the Department of Electrical and Computer Engineering of Democritus University of Thrace in Xanthi, Greece. He received the Ph.D. degree in 3D Image analysis from the Electronics engineering and Informatics department at Vrije Universiteit Brussel, Belgium, in January 1999. From March 1999 to March 2000, he joined IRISA/ViSTA group, Rennes, France as an INRIA postdoctoral fellow. From January 2003 to June 2010, he was working as Adjunct Researcher at the Institute of Informatics and Telecommunications in the National Centre for Scientific Research “Demokritos”, Athens, Greece. His research interests lie in image processing, pattern recognition, vision and graphics, and more specifically, in document image analysis and recognition, medical image analysis as well as multimedia content analysis, search and retrieval with a particular focus on visual content. He has served as co-chair of the Eurographics Workshop on 3D object retrieval (3DOR) in 2008 and 2009 as well as Guest Editor for the Special issue on 3D object retrieval at the International Journal of Computer Vision. He is a Senior Member of the IEEE, member of the Board of the Hellenic Artificial Intelligence Society for the period 2010–2012 and a member of the European Association for Computer Graphics (Eurographics).

    Panagiotis Perakis received his B.Sc. degree in Physics in 1986, his M.Sc. degree in ICT, in 2008 and his Ph.D. in Computer Science in 2013 from the University of Athens, Greece. Currently, he is a post doctoral fellow at NTNU, Norway. His research interests include computer graphics, computer vision and physics-based modeling. He is also a co-owner of a Greek software development company, since 1993.

    This research has been co-financed by the European Union (European Social Fund – ESF) and Greek National Funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) – Research Funding Program: THALES-3DOR (MIS 379516). Investing in knowledge Society through the European Social Fund.

    View full text