meshSIFT: Local surface features for 3D face recognition under expression variations and partial data☆
Highlights
► It is a generic method to extract features on multiple scales from 3D surfaces. ► It allows expression-stable 3D face recognition, validated for FRGC and Bosphorus. ► It outperforms other methods for 3D face recognition with missing data on SHREC’11. ► It can robustly normalise 3D face poses and estimate the symmetry plane of the 3D face.
Introduction
Although research in automatic face recognition has been conducted since the 1960s [1], it is still an active research area. Since 2D, image-based, face recognition is still hampered by pose variations and varying lighting conditions, recent research has shifted from 2D to 3D face representations. This shift is demonstrated by the establishment of large evaluation studies of 3D face recognition algorithms. In 2006, the Face Recognition Grand Challenge (FRGC) [2] was the first large comparison, followed by the Shape Retrieval Contest (SHREC) in 2007 [3], 2008 [4] and 2011 [5].
Three-dimensional face recognition in real case scenarios is becoming affordable due to technological improvements in 3D surface acquisition devices for security purposes. However, some important challenges inherent to 3D face recognition as well as related to acquisition issues remain. Inherent challenges are mainly due to intra-subject deformations, often caused by changes in facial expressions [6]. Facial muscle contractions cause the soft tissue of the face to deform during expression variations, affecting automatic recognition.
The second challenge is posed by the limited field of view of most 3D scanners, impeding the scanning of the entire face. As a result, 3D face recognition is still pose dependent. In realistic situations, such as for uncooperative subjects or uncontrolled environments, no assumption can be made on the pose. Therefore, 3D face recognition methods should be able to match partials scans with little or even no overlap. Fig. 1 shows an example of such partial scans, again of the same individual.
Since excellent surveys exist summarising the extensive work in 3D face recognition [6], [7], we will only review the work on expression-invariant face recognition and on face recognition not requiring overlap.
Expression-invariant 3D face recognition methods can be subdivided into three classes, depending on the way these methods handle expressions.
Historically, the first face recognition methods dealing with expression variations were region-based. These methods rely on parts of the face that remain unaffected during expression variations. The first and most used strategy is to select well-defined, anatomic regions based on observations or on literature such as the region around the nose [8], [9], cheek [10], chin [10], eyes [8], forehead [8], [11] and the region above the mouth [12]. A second strategy to determine expression-invariant regions, is the use of local features. Hereby regions, defined as local neighbourhoods around points of interest, are selected and matched automatically. If a local neighbourhood is small enough, it is assumed to be stable under expression variations. Convex regions [10], Gabor features [13], [14], [15], matched local invariant range images [16], [17], Haar and Pyramid wavelet features [18], local shape pattern (LSP) features [19], local binary patterns (LBPs) [20] appear to be less affected by expressions. The algorithm presented in this paper belongs to this type of strategy. The third strategy is based on the automatic determination of the parts unaffected by expression variations as determined after alignment/registration as in [21]. Points with a low registration error are considered to belong to an unaffected and thus more rigid part of the face, whereas points with a high registration error are more likely to belong to a part of the face that is affected by expression variations. Alternatively, these regions can be learned using a training database [20]. Related to learning expression-robust regions is the subdivision of the face in small regions. By fusing the results of these different regions (suppressing those affected by expression variations), a high recognition accuracy is achieved [22], [23].
The second major class of expression-invariant face recognition methods uses statistical models. A multivariate Gaussian (principal component analysis (PCA) based) point distribution model can deal with expressions by including faces with expression in the training data as in [24], [25], [26]. Expression induced deformations can also be modelled explicitly using PCA-decompositions, leading to ‘principal warps’ as is done by [27], [28]. The former linearly combined this expression model with a PCA shape model for identity, assuming that it is possible to transfer expressions from one face to another. When this assumption is considered to be false, it is necessary to combine the expression model and identity model into a bilinear model as in [29]. However, model fitting becomes computationally more demanding. Statistical models different from PCA have been suggested as well: independent component analysis (ICA) [24], linear discriminant analysis (LDA) [25] or simply pointwise mean and standard deviation [30].
The third class of algorithms makes use of an isometric deformation model in which facial surface changes due to expression variations are modelled as isometric deformations. The most used isometric deformation invariant representations are iso-geodesics, curves containing points on an equal geodesic distance to a reference point (nose tip), as in [31], [32], [33], [34], [35]. A computationally more demanding representation is the geodesic distance matrix, containing the geodesic distance between each pair of points as in [36], [37], [38], [39] or between a limited number of points as in [40], [41].
An comparative study of 3D recognition methods dealing with expression variations is given in [42], elaborating more on the advantages and disadvantages of the different classes. It also provides a meta-analysis in an attempt to compare the classes more quantitatively.
The general strategy to handle partial data is to fit a full face model to the partial scan. In literature, the Morphable Model (MM) and the Annotated Face Model (AFM) have been used to complete the facial surface. The MM is a statistical shape (and texture) model, which is originally used to reconstruct 3D faces from 2D photographs [43]. Fitting the 3D shape model (without texture) to a partial 3D scan, however, estimates the most likely 3D face as shown by van Jole and Veltkamp, and by Claes et al. in [5]. The results of both methods clearly differ, indicating the results to be implementation dependent. Passalis et al. [44], [45] propose a method based on fitting an AFM, which is UV-parametrised and contains annotated facial areas, to each partial scan. The pose and occluded areas caused by the pose are detected using an automatic landmark detector. Next, the AFM is fitted to the scan using facial symmetry resulting in a pose invariant geometry image (a 2D representation of the facial geometry).
Alternatively, Berretti et al. [46] automatically detect and describe features in depth images that are matched, even if a part of the probe scan is missing. It, however, requires sufficient overlap between probe and gallery scan.
In contrast, the local feature method, proposed here, uses the intrinsic symmetry of the human face not requiring any overlap and not relying on a full face model to complete the 3D facial surface.
The proposed method based on the meshSIFT algorithm is able to perform expression-invariant 3D face recognition in presence of outliers and missing data. The meshSIFT algorithm extracts features, ranging from fine details to coarse characteristic structures, in a shape-based scale space representation of the surface. The idea behind a scale space representation is to separate the structures in the surface according to the scale of the structure. This assumes that new structures must not be created from a fine to any coarser scale. Describing the features and matching them between two faces allows to perform recognition based on detailed similarities as well as more global similarities. The meshSIFT algorithm has been presented in previous work in [47] for detection of scale space extrema, and construction and matching of local feature descriptors. This is again resumed with more implementation details in Section 2. Symmetrizing the local feature descriptors, explained in our earlier work of [48] and, in more details, in Section 3, allows matching partial data based on facial symmetry. The performance is evaluated for expression-invariant 3D face recognition in Section 4 and for 3D face recognition for partial data in Section 5, both using the number of matching features as similarity criterion. Compared to the previous papers [47], [48], this validation is extended. In Section 6, the meshSIFT algorithm is tested for pose normalisation of 3D face scans and symmetry plane estimation using the matched features and RANSAC to estimate the transformation and symmetry plane, respectively. Section 7, finally, concludes the paper and gives some directions for future work.
Section snippets
MeshSIFT
The Scale Invariant Feature Transform (SIFT), proposed by Lowe [49], [50], has been shown to be a very powerful technique to extract distinctive invariant features from images and is applied to different problems in 2D computer vision such as image stitching [51], robot navigation and tracking [52], object recognition [49], 3D reconstruction and so forth. Triggered by the success of SIFT in 2D computer vision, there have been several attempts to extend the algorithm to three dimensions. N-SIFT
Symmetric meshSIFT
To compare face scans with limited or no overlap, such as the scans in Fig. 1, the meshSIFT algorithm is adapted. As the feature descriptor is not symmetrical, features on one face are not matched with their symmetrical counterpart. As a result, no matching features are found between scans with no overlap. The relevant symmetry here is reflection symmetry because of the left–right symmetry in human faces. Although mild facial asymmetries are common in typical growth and development [62], it is
Data
To demonstrate the effectiveness of the meshSIFT algorithm for expression-invariant face recognition, it is validated on the Bosphorus database [59] and the FRGC databases [2]. The Bosphorus database consist of 4666 scans from 105 subjects and is acquired with the “Inspeck Mega Capturor II 3D” scanner leading to 3D point clouds of approximately 35000 points. In the database expression variations, pose variations and occlusions are present. The 3D scans of the FRGC databases, which are 640 by
Data
To demonstrate the effectiveness of the proposed symmetric methods, we performed the validation experiment of the “SHREC’11-SHape REtrieval Contest for 3D Face Scans” [5], which has the objective to evaluate the performance of different 3D face recognition techniques. The dataset used contains scans from an anthropological collection of 130, approximately 100 year old, masks. The dataset is divided in a training set of 60 high quality scans, a test set of 70 high quality and 580 low quality
Pose normalisation
Because the angle at which a face is scanned cannot always be determined at scan time, 3D face scans show variation of the head pose. This is mostly the first correction that has to be made in 3D face preprocessing. Because of the 3D nature of the face scans, pose normalisation comes down to determining a rigid transformation matrix. In this experiment, we estimate this pose by matching meshSIFT features between the faces that need to be pose normalised. To increase the number of matches,
Conclusion
The proposed local feature method, called meshSIFT, detects salient points as extrema in a scale space, assigns a canonical orientation to the salient points based on the surface normals in the scale-dependent local neighbourhood and describes these salient points in a feature vector containing concatenated histograms of slant angles and shape indices. Since the descriptors are computed in local neighbourhoods that are approximately preserved during expression variations, they allow for
Acknowledgments
This work is supported by the Flemish Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT Vlaanderen), the Research Programme of the Fund for Scientific Research-Flanders (Belgium) (FWO) and the Research Fund K.U. Leuven.
We also like to acknowledge our former colleague, Thomas Fabry, and our former master’s thesis student, Chris Maes, for their contributions to this work.
The source code will be made publicly available for academic research purposes soon after
References (74)
- et al.
A survey of approaches and challenges in 3D and multi-modal 3D + 2D face recognition
Computer Vision and Image Understanding
(2006) - et al.
Automatic 3D face recognition from depth and intensity gabor features
Pattern Recognition
(2009) - et al.
A 3D face matching framework for facial curves
Graphical Models
(2009) - et al.
Isometric deformation invariant 3D shape recognition
Pattern Recognition
(2012) - et al.
Local velocity-adapted motion events for spatio-temporal recognition
Computer Vision and Image Understanding
(2007) - et al.
Local feature extraction and matching on range images: 2.5d sift
Computer Vision and Image Understanding
(2009) - et al.
2.5d face recognition using patch geodesic moments
Pattern Recognition
(2012) - W.W. Bledsoe, The model method in facial recognition, Technical Report PRI 15, Panoramic Research, Inc., Palo Alto,...
- et al.
Overview of the face recognition grand challenge
- R.C. Veltkamp, F. ter Haar, SHREC 2007 – shape retrieval contest of 3D face models, 2007....
A survey of 3D face recognition methods
Multiple nose region matching for 3D face recognition under varying facial expression
IEEE Transactions on Pattern Analysis and Machine Intelligence
Fusion of multiple facial regions for expression-invariant face recognition
SHREC’08 entry: 3D face recognition using integral shape information
Combined 2D/3D face recognition using log-gabor templates
3D face recognition using log-gabor templates
Face recognition using 2D and 3D multimodal local features
An efficient multimodal 2D–3D hybrid approach to automatic face recognition
IEEE Transaction on Pattern Analysis and Machine Intelligence
Three-dimensional face recognition in the presence of facial expressions: An annotated deformable model approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
3D face recognition based on local shape patterns and sparse representation classifier
Learning weighted sparse representation of encoded facial normal information for expression-robust 3D face recognition
Exploring facial expression effects in 3D face recognition using partial ICP
A region ensemble for 3-D face recognition
IEEE Transactions on Information Forensics and Security
Fast and accurate 3d face recognition
International Journal of Computer Vision
A novel technique for face recognition using range imaging
3D face recognition using 3D alignment for PCA
Expression invariant 3D face recognition with a morphable model
An expression deformation approach to non-rigid 3D face recognition
International Journal of Computer Vision
An efficient 3D face recognition algorithm
Description and retrieval of 3D face models using iso-geodesic stripes
Cited by (144)
A review of computer-based methods for classification and reconstruction of 3D high-density scanned archaeological pottery
2022, Journal of Cultural HeritageA comprehensive survey on 3D face recognition methods
2022, Engineering Applications of Artificial IntelligenceCitation Excerpt :The distances between facial curves weighted by their saliency are selected as the similarity measure. Smeets et al. (2013) directly extracted SIFT keypoints on 3D mesh, where the left–right symmetry of the human face allows comparing two facial surfaces with missing data. Non-occluded facial regions.
A geodesic multipolar parameterization-based representation for 3D face recognition
2021, Signal Processing: Image Communication3D shape retrieval based on Laplace operator and joint Bayesian model
2020, Visual InformaticsA multi-scale three-dimensional face recognition approach with sparse representation-based classifier and fusion of local covariance descriptors
2020, Computers and Electrical EngineeringCitation Excerpt :Later several attempts were made to extend the SIFT to deal with 3D meshes. A meshSIFT feature for 3D face recognition with expression variations and partial data was proposed in [6]. The Gaussian filter was also used in [7] to construct the scale-space on a 3D face mesh, after which the meshDOG algorithm was applied to detect the keypoints.
- ☆
This paper has been recommended for acceptance by L.S. Davis.