Elsevier

Pattern Recognition

Volume 57, September 2016, Pages 21-30
Pattern Recognition

A Grassmann framework for 4D facial shape analysis

https://doi.org/10.1016/j.patcog.2016.03.013Get rights and content

Highlights

  • Role of facial shape dynamics in identity recognition.

  • Effective representation of 3D faces and their dynamics on Grassmann manifolds.

  • Sparse coding and dictionary learning for 4D faces classification.

Abstract

In this paper, we investigate the contribution of dynamic evolution of 3D faces to identity recognition. To this end, we adopt a subspace representation of the flow of curvature-maps computed on 3D facial frames of a sequence, after normalizing their pose. Such representation allows us to embody the shape as well as its temporal evolution within the same subspace representation. Dictionary learning and sparse coding over the space of fixed-dimensional subspaces, called Grassmann manifold, have been used to perform face recognition. We have conducted extensive experiments on the BU-4DFE dataset. The obtained results of the proposed approach provide promising results.

Introduction

In recent years automatic face analysis has attracted increasing interest in the field of computer vision and pattern recognition due to its inherent challenges and its potential in a wide spectrum of applications, including security surveillance [1], [2] and diagnostic of facial pathology [3]. Despite the great progress, 2D face analysis approaches that depend on color or gray-scale image analysis, still suffer from illumination and pose variations, which often occur in real-world conditions. With the rapid innovation of 3D cameras, the 3D shape is regarded as a promising alternative to achieve robust face analysis [4], [5]. Very recently, the advent of 4D imaging systems capable of acquiring temporal sequences of 3D scans (i.e., 4D is regarded as 3D over the time) made possible comprehensive face analysis by introducing the temporal dimension, where the temporal behavior of 3D faces is captured by adjacent frames [6], [7]. Note that such temporal information is crucial for analyzing the facial deformations. Despite the large amount of work on static and dynamic 3D facial scans analysis, temporal modeling is still almost unexplored for identity recognition. Moving from shape analysis of static 3D faces to dynamic faces (4D faces) gives rise to new challenges related to the nature of the data and the processing time – which static and dynamic shape representations are most suited to 4D face analysis? How the temporal dimension can contribute to face analysis? Is it possible to compute statistical summaries on dynamic 3D faces? From a perspective of face classification, which relevant features and classification algorithms can be used?

In this paper, we aim to answer the above questions by proposing a comprehensive framework for modeling and analyzing 3D facial sequences (4D faces), with an experimental illustration in face recognition from 4D sequences.

Recently, works addressing face analysis from temporal sequences of 3D scans start to appear in the literature, encouraged by the advancement in 3D sensors’ technology, with some of them restricted to RGB-D Kinect-like sensors. In [8], Berretti et al. investigated the impact of 3D facial scans’ resolution on the recognition rate by building super resolution 3D models from consumer depth camera frames. Experimental studies using the new 3D super resolution method validate the increase of recognition performance with the reconstructed higher resolution models. Hsu et al. [9] showed that incorporating depth images of the subjects in the gallery can improve the recognition rate, especially in the case of pose variations, even though there are only 2D still images in the testing. In the last few years, some works addressed face recognition from dynamic sequences of 3D face scans as well like in [6], where Sun et al. proposed a 4D-HMM based approach. In this work, a 3D dynamic spatio-temporal face recognition framework is derived by computing a local descriptor based on the curvature values at vertices of 3D faces. Spatial and temporal HMM are used for the recognition process, using 22 landmarks manually annotated and tracked over time. As an important achievement of this work, it is also evidenced that 3D face dynamics provides better results than 2D videos and 3D static scans.

Subspace representation for dynamic facial information either for image sets or for image sequences (videos) showed a great success. Shigenaka et al. [10] proposed a Grassmann distance mutual subspace method (GD-MSM) and Grassmann Kernel Support Vector Machine (GK-SVM) comparison study for the face recognition problem from a mobile 2D video database. In [11], Lui et al. proposed a geodesic distance based algorithm for face recognition from 2D image sets. Turaga et al. [12] presented a statistical method for video based face recognition. These methods use subspace-based models and tools from Riemannian geometry of the Grassmann manifold. Intrinsic and extrinsic statistics are derived for maximum-likelihood classification applications. More recently, Huang et al. [13] proposed learning projection distance on Grassmann manifold for face recognition from image sets. In this work, an improved recognition is obtained by representing every image set using a Gaussian distribution over the manifold.

Sparse representation and dictionary learning attracted a lot of attention recently, due to their success in many computer vision problems. In [14], a sparse coding framework was presented for face recognition from still images. In this work, Wright et al. showed that using sparse coding the role of feature extraction on the performance is not so important, and the sparse coding is more tolerant with face occlusion. Yang et al. [15] proposed a robust sparse coding (RSC) approach for face recognition. In this work, the sparse coding problem is solved as a constrained robust regression, which makes the recognition more robust against occlusion, change of lighting and expression variation in still images. Elhamifar et al. [16] presented the Sparse Subspace Clustering (SSC) algorithm that classifies linear subspaces after finding their sparse coding. A generalization of sparse coding and dictionary learning was proposed by Xie et al. [17], which permits its application on subspace data representations that do not have a linear structure, like the Riemannian manifold. Mapping points from a non-linear manifold to tangent spaces shows good classification results on texture and medical images’ classification.

In [18], Harandi et al. proposed an extrinsic solution to combine sparse coding and dictionary learning with nonlinear subspaces, like the Grassmann manifold. Embedding the Grassmann manifold into the symmetric matrices’ sub-manifold makes the sparse coding on the induced manifold possible, faster, and more coherent than intrinsic embedding on one or more tangent spaces. Application to 2D video face datasets shows the efficiency of this approach against other learning solutions.

Section snippets

Methodology and contributions

In this paper, we investigate the contribution of 3D face dynamics in face recognition. To this end, after a preprocessing step, we compute surface curvature from each 3D static mesh of a sequence, and project it to a 2D map (call edcurvature map). A sequence of curvature maps is then cast to a matrix form by re shaping the 2D maps to column vectors, Singular Value Decomposition (SVD) is used to reduce the subspace spanned by the matrix to that of the first k-singular-vectors, which in turn is

Modeling sequences of 3D faces on Grassmann manifold

The idea of modeling multiple-instances of visual data, like set of images or video sequences, as linear subspaces for classification and recognition tasks has revealed its efficiency in many computer vision problems [12], [19], [20]. This compact low-dimensional data representation has the main advantage in its robustness against noise or missing parts in the original data. Besides, the availability of computational tools from differential geometry makes working on non-linear data (e.g., the

Grassmann sparse representation

Recently, the sparse coding theory showed great success in several topics, like signal processing [23], image classification [24] and face recognition [15], where a given signal or image can be approximated effectively as a combination of few members (atoms) of a dictionary. The success of sparse coding motivated the extension of this approach to the space of linear subspaces [12], in order to represent a subspace as the combination of few subspaces of a dictionary. However, in so doing, the

Identity recognition from 4D faces

To perform face recognition from the 3D facial shapes and their temporal evolution, the flow of curvature-maps is first divided into clips (subsequences) of size w. Then, each clip is modeled as an element on the Grassmann manifold via k-SVD orthogonalization. More formally, given a sequence of curvature-maps {m0,,mt}, a predefined size of a sliding window w, and a fixed subspace order k, the idea is to consider the maps under the temporal interval [tw+1,t], and to compute the corresponding

Experimental results

To investigate the contribution of facial dynamics in identity recognition using 4D data, we conducted extensive experiments on the BU-4DFE dataset. This dataset has been collected at Binghamton University [30] and is currently used in several studies on 4D facial expression recognition. To our knowledge, only the work of Sun et al. [6] has reported identification performance on this dataset. The main characteristics of the BU-4DFE dataset are summarized in Table 2.

Conclusions and future perspectives

In this paper, we have proposed a comprehensive and effective 4D face recognition framework, which adopts a subspace-learning methodology. We have demonstrated that the shape dynamics (behavior) improves the recognition accuracy. This conclusion is valid even if the training samples (in the gallery) and the probes (to be recognized) present a different behavior. Leveraging the geometry of Grassmann manifolds, relevant geometric tools and advanced Machine Learning tools, i.e., dictionary

Conflict of interest

None declared.

Acknowledgment

This research was supported by the Futur & Rupture program of the Institut Mines-Télécom, the MAGNUM project (BPI and Région Nord-Pas de Calais) and the PHC Utique 2016 program for the CMCU project number 34882WK.

Taleb Alashkar is a research assistant in Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA. He received his Ph.D. degree in computer science from University of Lille 1 and Master degree in Computer Vision and Image Processing from University of Dijon in 2015 and 2012 respectively. Before that he finished Bachelor of Science degree in Computer Engineering from University of Aleppo, Syria. His research interests include Computer Vision, Machine Learning

References (32)

  • Y.M. Lui, J. Beveridge, B. Draper, M. Kirby, Image-set matching using a geodesic distance and cohort normalization, in:...
  • P. Turaga et al.

    Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • Z. Huang, R. Wang, S. Shan, X. Chen, Projection metric learning on Grassmann manifold with application to video based...
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • M. Yang, D. Zhang, J. Yang, D. Zhang, Robust sparse coding for face recognition, in: IEEE Conference on Computer Vision...
  • E. Elhamifar et al.

    Sparse subspace clusteringalgorithm, theory, and applications

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2013)
  • Cited by (11)

    • 3D facial expression modeling based on facial landmarks in single image

      2019, Neurocomputing
      Citation Excerpt :

      Kurtek and Drira [40] used the elastic measure in shape space to analysis different 3D faces. Alashkar et al. [41] mapped the 3D facial data flow into Grassmann manifold for facial recognition. Patel and Smith [42] combined static facial modeling and manifold learning framework for facial modeling.

    • Review on facial expression modeling

      2022, Bulletin of Electrical Engineering and Informatics
    • Space-Time Triplet Loss Network for Dynamic 3D Face Verification

      2021, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • 3D face modeling from single image based on discrete shape space

      2020, Computer Animation and Virtual Worlds
    View all citing articles on Scopus

    Taleb Alashkar is a research assistant in Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA. He received his Ph.D. degree in computer science from University of Lille 1 and Master degree in Computer Vision and Image Processing from University of Dijon in 2015 and 2012 respectively. Before that he finished Bachelor of Science degree in Computer Engineering from University of Aleppo, Syria. His research interests include Computer Vision, Machine Learning and Pattern Recognition.

    Boulbaba Ben Amor is Associate Professor (with Habilitation) of computer science with the Institut Mines-Télécom/Télécom Lille and member of the CRIStAL Research Center (UMR CNRS 9189), since 2007. He received the Ph.D. degree from Ecole Centrale de Lyon (France) in 2006. During 2013–2014, he was a visiting research professor at Florida State University (USA). He served as Area Chair for the WACV׳16 conference and Reviewer for several major conferences (ICCV, CVPR, EECV, ICPR, etc.) and Journals (T-PAMI, T-IP, T-IFS, T-Cybernetics, etc.) in computer vision. His research areas include 3D computer vision, 3D/4D shape analysis and pattern recognition.

    Mohamed Daoudi is a Professor of Computer Science at Institut Mines-Télécom/Télécom Lille and the head of Image group at CRIStAL Laboratory (UMR CNRS 9189), France. He received his Ph.D. degree in Computer Engineering from the University of Lille 1 (France) in 1993 and Habilitation à Diriger des Recherches from the University of Littoral (France) in 2000. His research interests include pattern recognition, shape analysis, computer vision and 3D object processing. He has published over 150 research papers dealing with these subjects that have appeared in the most distinguished peer-reviewed journal and conference proceedings. He is the co-author of several books including 3D Face Modelling, Analysis and Recognition (Wiley 2013) and 3D Object Processing: Compression, Indexing and Watermarking (Wiley 2008). He has been Conference Chair of the Shape Modelling International Conference (2015) and several other national conferences and international workshops. He is Fellow of IAPR, Senior Member of IEEE and member of Association of Computing Machinery (ACM).

    Stefano Berretti received the Ph.D. in Information and Telecommunications Engineering in 2001 from the University of Florence, Italy. Currently, he is an Associate Professor at the Department of Information Engineering and at the Media Integration and Communication Center of the University of Florence, Italy. His main research interests focus on 3D object retrieval and partitioning, face recognition and facial expression recognition from 3D and 4D data, 3D face super-resolution, human action recognition from 3D data. He has been visiting researcher at the Indian Institute of Technology (IIT), in Mumbai, India, and visiting professor at the Mines-Télécom/Télécom Lille, in Lille, France, and at the Khalifa University, Sharjah, UAE. Stefano Berretti is author of more than 120 papers appeared in conference proceedings and international journals in the area of pattern recognition, computer vision and multimedia. He is in the program committee of several international conferences and serves as a frequent reviewer of many international journals. He has been co-chair of the Fifth Workshop on Non-Rigid Shape Analysis and Deformable Image Alignment (NORDIA 2012), held in conjunction with ECCV 2012. Since January 2016 he is Information Director of the ACM Transactions on Multimedia.

    View full text