meshSIFT: Local surface features for 3D face recognition under expression variations and partial data

doi:10.1016/j.cviu.2012.10.002

Computer Vision and Image Understanding

Volume 117, Issue 2, February 2013, Pages 158-169

https://doi.org/10.1016/j.cviu.2012.10.002 Get rights and content

Abstract

Matching 3D faces for recognition is a challenging task caused by the presence of expression variations, missing data, and outliers. In this paper the meshSIFT algorithm and its use for 3D face recognition is presented. This algorithm consists of four major components. First, salient points on the 3D facial surface are detected as mean curvature extrema in scale space. Second, orientations are assigned to each of these salient points. Third, the neighbourhood of each salient point is described in a feature vector consisting of concatenated histograms of shape indices and slant angles. Fourth, the feature vectors of two 3D facial surfaces are reliably matched by comparing the angles in feature space. This results in an algorithm which is robust to expression variations, missing data and outliers.

As a first contribution, we demonstrate that the number of matching meshSIFT features is a reliable measure for expression-invariant face recognition, as shown by the rank 1 recognition rate of 93.7% and 89.6% for the Bosphorus and FRGC v2 database, respectively. Next, we demonstrate that symmetrising the feature descriptors allows comparing two 3D facial surfaces with limited or no overlap. Validation on the data of the “SHREC’11: Face Scans” contest, containing many partial scans, resulted in a recognition rate of 98.6%, clearly outperforming all other participants in the challenge. Finally, we also demonstrate the use of meshSIFT for two other problems related with 3D face recognition: pose normalisation and symmetry plane estimation. For both problems, applying meshSIFT in combination with RANSAC resulted in a correct solution for ±90% of all Bosphorus database meshes (except ±90° and ±45° rotations).

Highlights

► It is a generic method to extract features on multiple scales from 3D surfaces. ► It allows expression-stable 3D face recognition, validated for FRGC and Bosphorus. ► It outperforms other methods for 3D face recognition with missing data on SHREC’11. ► It can robustly normalise 3D face poses and estimate the symmetry plane of the 3D face.

Introduction

Although research in automatic face recognition has been conducted since the 1960s [1], it is still an active research area. Since 2D, image-based, face recognition is still hampered by pose variations and varying lighting conditions, recent research has shifted from 2D to 3D face representations. This shift is demonstrated by the establishment of large evaluation studies of 3D face recognition algorithms. In 2006, the Face Recognition Grand Challenge (FRGC) [2] was the first large comparison, followed by the Shape Retrieval Contest (SHREC) in 2007 [3], 2008 [4] and 2011 [5].

Three-dimensional face recognition in real case scenarios is becoming affordable due to technological improvements in 3D surface acquisition devices for security purposes. However, some important challenges inherent to 3D face recognition as well as related to acquisition issues remain. Inherent challenges are mainly due to intra-subject deformations, often caused by changes in facial expressions [6]. Facial muscle contractions cause the soft tissue of the face to deform during expression variations, affecting automatic recognition.

The second challenge is posed by the limited field of view of most 3D scanners, impeding the scanning of the entire face. As a result, 3D face recognition is still pose dependent. In realistic situations, such as for uncooperative subjects or uncontrolled environments, no assumption can be made on the pose. Therefore, 3D face recognition methods should be able to match partials scans with little or even no overlap. Fig. 1 shows an example of such partial scans, again of the same individual.

Since excellent surveys exist summarising the extensive work in 3D face recognition [6], [7], we will only review the work on expression-invariant face recognition and on face recognition not requiring overlap.

Expression-invariant 3D face recognition methods can be subdivided into three classes, depending on the way these methods handle expressions.

Historically, the first face recognition methods dealing with expression variations were region-based. These methods rely on parts of the face that remain unaffected during expression variations. The first and most used strategy is to select well-defined, anatomic regions based on observations or on literature such as the region around the nose [8], [9], cheek [10], chin [10], eyes [8], forehead [8], [11] and the region above the mouth [12]. A second strategy to determine expression-invariant regions, is the use of local features. Hereby regions, defined as local neighbourhoods around points of interest, are selected and matched automatically. If a local neighbourhood is small enough, it is assumed to be stable under expression variations. Convex regions [10], Gabor features [13], [14], [15], matched local invariant range images [16], [17], Haar and Pyramid wavelet features [18], local shape pattern (LSP) features [19], local binary patterns (LBPs) [20] appear to be less affected by expressions. The algorithm presented in this paper belongs to this type of strategy. The third strategy is based on the automatic determination of the parts unaffected by expression variations as determined after alignment/registration as in [21]. Points with a low registration error are considered to belong to an unaffected and thus more rigid part of the face, whereas points with a high registration error are more likely to belong to a part of the face that is affected by expression variations. Alternatively, these regions can be learned using a training database [20]. Related to learning expression-robust regions is the subdivision of the face in small regions. By fusing the results of these different regions (suppressing those affected by expression variations), a high recognition accuracy is achieved [22], [23].

The second major class of expression-invariant face recognition methods uses statistical models. A multivariate Gaussian (principal component analysis (PCA) based) point distribution model can deal with expressions by including faces with expression in the training data as in [24], [25], [26]. Expression induced deformations can also be modelled explicitly using PCA-decompositions, leading to ‘principal warps’ as is done by [27], [28]. The former linearly combined this expression model with a PCA shape model for identity, assuming that it is possible to transfer expressions from one face to another. When this assumption is considered to be false, it is necessary to combine the expression model and identity model into a bilinear model as in [29]. However, model fitting becomes computationally more demanding. Statistical models different from PCA have been suggested as well: independent component analysis (ICA) [24], linear discriminant analysis (LDA) [25] or simply pointwise mean and standard deviation [30].

The third class of algorithms makes use of an isometric deformation model in which facial surface changes due to expression variations are modelled as isometric deformations. The most used isometric deformation invariant representations are iso-geodesics, curves containing points on an equal geodesic distance to a reference point (nose tip), as in [31], [32], [33], [34], [35]. A computationally more demanding representation is the geodesic distance matrix, containing the geodesic distance between each pair of points as in [36], [37], [38], [39] or between a limited number of points as in [40], [41].

An comparative study of 3D recognition methods dealing with expression variations is given in [42], elaborating more on the advantages and disadvantages of the different classes. It also provides a meta-analysis in an attempt to compare the classes more quantitatively.

The general strategy to handle partial data is to fit a full face model to the partial scan. In literature, the Morphable Model (MM) and the Annotated Face Model (AFM) have been used to complete the facial surface. The MM is a statistical shape (and texture) model, which is originally used to reconstruct 3D faces from 2D photographs [43]. Fitting the 3D shape model (without texture) to a partial 3D scan, however, estimates the most likely 3D face as shown by van Jole and Veltkamp, and by Claes et al. in [5]. The results of both methods clearly differ, indicating the results to be implementation dependent. Passalis et al. [44], [45] propose a method based on fitting an AFM, which is UV-parametrised and contains annotated facial areas, to each partial scan. The pose and occluded areas caused by the pose are detected using an automatic landmark detector. Next, the AFM is fitted to the scan using facial symmetry resulting in a pose invariant geometry image (a 2D representation of the facial geometry).

Alternatively, Berretti et al. [46] automatically detect and describe features in depth images that are matched, even if a part of the probe scan is missing. It, however, requires sufficient overlap between probe and gallery scan.

In contrast, the local feature method, proposed here, uses the intrinsic symmetry of the human face not requiring any overlap and not relying on a full face model to complete the 3D facial surface.

The proposed method based on the meshSIFT algorithm is able to perform expression-invariant 3D face recognition in presence of outliers and missing data. The meshSIFT algorithm extracts features, ranging from fine details to coarse characteristic structures, in a shape-based scale space representation of the surface. The idea behind a scale space representation is to separate the structures in the surface according to the scale of the structure. This assumes that new structures must not be created from a fine to any coarser scale. Describing the features and matching them between two faces allows to perform recognition based on detailed similarities as well as more global similarities. The meshSIFT algorithm has been presented in previous work in [47] for detection of scale space extrema, and construction and matching of local feature descriptors. This is again resumed with more implementation details in Section 2. Symmetrizing the local feature descriptors, explained in our earlier work of [48] and, in more details, in Section 3, allows matching partial data based on facial symmetry. The performance is evaluated for expression-invariant 3D face recognition in Section 4 and for 3D face recognition for partial data in Section 5, both using the number of matching features as similarity criterion. Compared to the previous papers [47], [48], this validation is extended. In Section 6, the meshSIFT algorithm is tested for pose normalisation of 3D face scans and symmetry plane estimation using the matched features and RANSAC to estimate the transformation and symmetry plane, respectively. Section 7, finally, concludes the paper and gives some directions for future work.

Section snippets

MeshSIFT

The Scale Invariant Feature Transform (SIFT), proposed by Lowe [49], [50], has been shown to be a very powerful technique to extract distinctive invariant features from images and is applied to different problems in 2D computer vision such as image stitching [51], robot navigation and tracking [52], object recognition [49], 3D reconstruction and so forth. Triggered by the success of SIFT in 2D computer vision, there have been several attempts to extend the algorithm to three dimensions. N-SIFT

Symmetric meshSIFT

To compare face scans with limited or no overlap, such as the scans in Fig. 1, the meshSIFT algorithm is adapted. As the feature descriptor is not symmetrical, features on one face are not matched with their symmetrical counterpart. As a result, no matching features are found between scans with no overlap. The relevant symmetry here is reflection symmetry because of the left–right symmetry in human faces. Although mild facial asymmetries are common in typical growth and development [62], it is

Data

To demonstrate the effectiveness of the meshSIFT algorithm for expression-invariant face recognition, it is validated on the Bosphorus database [59] and the FRGC databases [2]. The Bosphorus database consist of 4666 scans from 105 subjects and is acquired with the “Inspeck Mega Capturor II 3D” scanner leading to 3D point clouds of approximately 35000 points. In the database expression variations, pose variations and occlusions are present. The 3D scans of the FRGC databases, which are 640 by

Data

To demonstrate the effectiveness of the proposed symmetric methods, we performed the validation experiment of the “SHREC’11-SHape REtrieval Contest for 3D Face Scans” [5], which has the objective to evaluate the performance of different 3D face recognition techniques. The dataset used contains scans from an anthropological collection of 130, approximately 100 year old, masks. The dataset is divided in a training set of 60 high quality scans, a test set of 70 high quality and 580 low quality

Pose normalisation

Because the angle at which a face is scanned cannot always be determined at scan time, 3D face scans show variation of the head pose. This is mostly the first correction that has to be made in 3D face preprocessing. Because of the 3D nature of the face scans, pose normalisation comes down to determining a rigid transformation matrix. In this experiment, we estimate this pose by matching meshSIFT features between the faces that need to be pose normalised. To increase the number of matches,

Conclusion

The proposed local feature method, called meshSIFT, detects salient points as extrema in a scale space, assigns a canonical orientation to the salient points based on the surface normals in the scale-dependent local neighbourhood and describes these salient points in a feature vector containing concatenated histograms of slant angles and shape indices. Since the descriptors are computed in local neighbourhoods that are approximately preserved during expression variations, they allow for

Acknowledgments

This work is supported by the Flemish Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT Vlaanderen), the Research Programme of the Fund for Scientific Research-Flanders (Belgium) (FWO) and the Research Fund K.U. Leuven.

We also like to acknowledge our former colleague, Thomas Fabry, and our former master’s thesis student, Chris Maes, for their contributions to this work.

The source code will be made publicly available for academic research purposes soon after

References (74)

K.W. Bowyer et al.
A survey of approaches and challenges in 3D and multi-modal 3D + 2D face recognition
Computer Vision and Image Understanding
(2006)
C. Xu et al.
Automatic 3D face recognition from depth and intensity gabor features
Pattern Recognition
(2009)
F.B. ter Haar et al.
A 3D face matching framework for facial curves
Graphical Models
(2009)
D. Smeets et al.
Isometric deformation invariant 3D shape recognition
Pattern Recognition
(2012)
I. Laptev et al.
Local velocity-adapted motion events for spatio-temporal recognition
Computer Vision and Image Understanding
(2007)
T.-W.R. Lo et al.
Local feature extraction and matching on range images: 2.5d sift
Computer Vision and Image Understanding
(2009)
F. Hajati et al.
2.5d face recognition using patch geodesic moments
Pattern Recognition
(2012)
W.W. Bledsoe, The model method in facial recognition, Technical Report PRI 15, Panoramic Research, Inc., Palo Alto,...
P.J. Phillips et al.
Overview of the face recognition grand challenge
R.C. Veltkamp, F. ter Haar, SHREC 2007 – shape retrieval contest of 3D face models, 2007....

M. Daoudi, F. ter Haar, R.C. Veltkamp, SHREC 2008 – shape retrieval contest of 3D face scans, 2008....

R.C. Veltkamp, S. van Jole, B. Ben Amor, M. Daoudi, H. Li, L. Chen, P. Claes, D. Smeets, J. Hermans, D. Vandermeulen,...

A. Scheenstra et al.

A survey of 3D face recognition methods

A.S. Mian, M. Bennamoun, R.A. Owens, Region-based matching for robust 3D face recognition, in: BMVC ’05: Proceedings of...

K.I. Chang et al.

Multiple nose region matching for 3D face recognition under varying facial expression

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2006)

J.C. Lee, E. Milios, Matching range images of human faces, in: ICCV ’90: Proceedings of the Third IEEE International...

W.-Y. Lin et al.

Fusion of multiple facial regions for expression-invariant face recognition

S. Berretti et al.

SHREC’08 entry: 3D face recognition using integral shape information

J.A. Cook et al.

Combined 2D/3D face recognition using log-gabor templates

J.A. Cook et al.

3D face recognition using log-gabor templates

A.S. Mian et al.

Face recognition using 2D and 3D multimodal local features

A. Mian et al.

An efficient multimodal 2D–3D hybrid approach to automatic face recognition

IEEE Transaction on Pattern Analysis and Machine Intelligence

(2007)

I.A. Kakadiaris et al.

Three-dimensional face recognition in the presence of facial expressions: An annotated deformable model approach

IEEE Transactions on Pattern Analysis and Machine Intelligence

(2007)

D. Huang et al.

3D face recognition based on local shape patterns and sparse representation classifier

H. Li et al.

Learning weighted sparse representation of encoded facial normal information for expression-robust 3D face recognition

Y. Wang et al.

Exploring facial expression effects in 3D face recognition using partial ICP

T. Faltemier et al.

A region ensemble for 3-D face recognition

IEEE Transactions on Information Forensics and Security

(2008)

L. Spreeuwers

Fast and accurate 3d face recognition

International Journal of Computer Vision

(2011)

C. Hesher et al.

A novel technique for face recognition using range imaging

T. Heseltine, N. Pears, J. Austin, Three-dimensional face recognition using surface space combinations, in: A. Hoppe,...

T. Russ et al.

3D face recognition using 3D alignment for PCA

B. Amberg et al.

Expression invariant 3D face recognition with a morphable model

F. Al-Osaimi et al.

An expression deformation approach to non-rigid 3D face recognition

International Journal of Computer Vision

(2009)

I. Mpiperis, S. Malassiotis, M.G. Strintzis, Expression-compensated 3D face recognition with geodesically aligned...

V.D. Kaushik et al.

An efficient 3D face recognition algorithm

S. Berretti et al.

Description and retrieval of 3D face models using iso-geodesic stripes

S. Feng, H. Krim, I.A. Kogan, 3D face recognition using euclidean integral invariants signature, in: SSP ’07: IEEE/SP...

Cited by (144)

A review of computer-based methods for classification and reconstruction of 3D high-density scanned archaeological pottery
2022, Journal of Cultural Heritage
Ceramics analysis, classification, and reconstruction are essential to know an archaeological site's history, economy, and art. Traditional methods used by the archaeologists for their investigation are time-consuming and are neither reproducible nor repeatable. The results depend on the operator's subjectivity, specialization, personal skills, and professional experience. Consequently, only a few indicative samples with characteristic components are studied with wide uncertainties. Several automatic methods for analysing sherds have been published in the last years to overcome these limitations. To help all the involved researchers, this paper aims to provide a complete and critical analysis of the state-of-the-art until the end of 2021 of the most important published methods on pottery analysis, classification, and reconstruction from a 3D discrete manifold model. To this end, papers in English indexed by the Scopus database are selected by using the following keywords: “computer methods in archaeology”, “3D archaeology”, “3D reconstruction”, “3D puzzling”, “automatic feature recognition and reconstruction”. Additional references complete the list found through the reading of selected papers. The 125 selected papers, referring to only archaeological potteries, are divided into six groups: 3D digitalization, virtual prototyping, Fragment features processing, geometric model processing of whole-shape pottery, 3D Vessel reconstruction from its fragments, classification, and 3D information systems for archaeological pottery visualization and documentation.
In the present review, the techniques considered for these issues are critically analysed to highlight their pros and cons and provide recommendations for future research.
A comprehensive survey on 3D face recognition methods
2022, Engineering Applications of Artificial Intelligence
Citation Excerpt :
The distances between facial curves weighted by their saliency are selected as the similarity measure. Smeets et al. (2013) directly extracted SIFT keypoints on 3D mesh, where the left–right symmetry of the human face allows comparing two facial surfaces with missing data. Non-occluded facial regions.
3D face recognition (3DFR) has emerged as an effective means of characterizing facial identity over the past several decades. Depending on the types of techniques used in recognition, these methods are categorized into traditional and modern. The former generally extract distinctive facial features (e.g. global, local, and hybrid features) for matching, whereas the latter rely primarily on deep learning to perform 3DFR in an end-to-end way. Many literature surveys have been carried out reviewing either traditional or modern methods alone, while only a few studies are conducted simultaneously on both of them. This survey presents a state-of-the-art for 3DFR covering both traditional and modern methods, focusing on the techniques used in face processing, feature extraction, and classification. In addition, we review some specific face recognition challenges, including pose, illumination, expression variations, self-occlusion, and spoofing attack. The commonly used 3D face datasets have been summarized as well.
A geodesic multipolar parameterization-based representation for 3D face recognition
2021, Signal Processing: Image Communication
Here, we propose a new 3D representation designed in the objective of face recognition independently of the expressions. It is defined on localities presenting limited deformations following expression variations. The proposed representation is based on the multipolar parameterization that we recently introduced in Jribi et al. (2019) which is relative invariant under three dimensional Euclidean transformations and robust to the original mesh. A choice of the number of reference points of each multipolar parameterization is made according to the shape of the two types of region of the face namely those of nose and the eyes. The main curvature field is estimated on the parameterizations of each region. The parameters of dimensional reduction algorithms applied to the overall description are adjusted so that the recognition rates remain efficient. The experiments are carried out on three challenging 3D face databases: the FRGC v2.0, the BU-3DFE and Bosphorus. Very high rates are obtained for both identification and verification scenarios. These results are very competitive with state of the art methods.
Research on 3D face recognition method based on LBP and SVM
2020, Optik
In order to improve the accuracy and speed of three-dimensional face recognition, this paper proposes a three-dimensional face recognition method combining LBP and SVM. First, the LBP algorithm is used to extract the feature information of the three-dimensional face depth image, then the SVM algorithm is used to classify the feature information. By selecting samples in the Texas 3DFRD three-dimensional face depth database and the self-made depth database, the experiment proves that the algorithm has a better recognition rate and shorter time consumption.
3D shape retrieval based on Laplace operator and joint Bayesian model
2020, Visual Informatics
Feature analysis plays a significant role in computer vision and computer graphics. In the task of shape retrieval, shape descriptor is indispensable. In recent years, feature extraction based on deep learning becomes very popular, but the design of geometric shape descriptor is still meaningful due to the contained intrinsic information and interpretability. This paper proposes an effective and robust descriptor of 3D models. The descriptor is constructed based on the probability distribution of the normalized eigenfunctions of the Laplace–Beltrami operator on the surface, and a spectrum method for dimensionality reduction. The distance metric of the descriptor space is learned by utilizing the joint Bayesian model, and we introduce a matrix regularization in the training stage to re-estimate the covariance matrix. Finally, we apply the descriptor to 3D shape retrieval on a public benchmark. Experiments show that our method is robust and has good retrieval performance.
A multi-scale three-dimensional face recognition approach with sparse representation-based classifier and fusion of local covariance descriptors
2020, Computers and Electrical Engineering
Citation Excerpt :
Later several attempts were made to extend the SIFT to deal with 3D meshes. A meshSIFT feature for 3D face recognition with expression variations and partial data was proposed in [6]. The Gaussian filter was also used in [7] to construct the scale-space on a 3D face mesh, after which the meshDOG algorithm was applied to detect the keypoints.
In this paper, an efficient multi-scale hybrid approach is proposed to tackle two main problems in three-dimensional (3D) face recognition, namely the singularity of scale features representation and underexplored locality in dictionary learning. The multi-scale features space representation is developed based on the new 3D faces generated by the Gaussian filter. The locality-sensitive Riemannian sparse representation-based classifier is also constructed to accurately recognize faces with various expressions, poses and occlusions. Two sets of face recognition experiment, one that includes expression variations, and the another that includes pose and occlusion variations, are conducted to compare the performance of the proposed approach against other benchmark 3D face recognition algorithms. The recognition accuracies of the proposed algorithm to both Neutral vs. Neutral achieved on Face Recognition Grand Challenge (FRGC) v2.0 database and Bosphorus database are 100%.

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by L.S. Davis.

View full text

meshSIFT: Local surface features for 3D face recognition under expression variations and partial data☆

Abstract

Highlights

Introduction

Section snippets

MeshSIFT

Symmetric meshSIFT

Data

Data

Pose normalisation

Conclusion

Acknowledgments

Computer Vision and Image Understanding

Pattern Recognition

Graphical Models

Pattern Recognition

Computer Vision and Image Understanding

Computer Vision and Image Understanding

Pattern Recognition

Overview of the face recognition grand challenge

A survey of 3D face recognition methods

Multiple nose region matching for 3D face recognition under varying facial expression

IEEE Transactions on Pattern Analysis and Machine Intelligence

Fusion of multiple facial regions for expression-invariant face recognition

SHREC’08 entry: 3D face recognition using integral shape information

Combined 2D/3D face recognition using log-gabor templates

3D face recognition using log-gabor templates

Face recognition using 2D and 3D multimodal local features

An efficient multimodal 2D–3D hybrid approach to automatic face recognition

IEEE Transaction on Pattern Analysis and Machine Intelligence

Three-dimensional face recognition in the presence of facial expressions: An annotated deformable model approach

IEEE Transactions on Pattern Analysis and Machine Intelligence

3D face recognition based on local shape patterns and sparse representation classifier

Learning weighted sparse representation of encoded facial normal information for expression-robust 3D face recognition

Exploring facial expression effects in 3D face recognition using partial ICP

A region ensemble for 3-D face recognition

IEEE Transactions on Information Forensics and Security

Fast and accurate 3d face recognition

International Journal of Computer Vision

A novel technique for face recognition using range imaging

3D face recognition using 3D alignment for PCA

Expression invariant 3D face recognition with a morphable model

An expression deformation approach to non-rigid 3D face recognition

International Journal of Computer Vision

An efficient 3D face recognition algorithm

Description and retrieval of 3D face models using iso-geodesic stripes