Synonyms
Face matching; Face registration
Definition
Face alignment is a computer vision technology for identifying the geometric structure of human faces in digital images. Given the location and size of a face, it automatically determines the shape of the face components such as eyes and nose. A face alignment program typically operates by iteratively adjusting a deformable model, which encodes the prior knowledge of face shape or appearance, to take into account the low-level image evidences and find the face that is present in the image.
Introduction
The ability of understanding and interpreting facial structures is important for many image analysis tasks. Suppose that if we want to identify a person from a surveillance camera, a natural approach would be running the face image of the person through a database of known faces, examining the differences, and identifying the best match. However, simply subtracting one image from another would not yield the desirable differences (as shown in Fig. 1) unless two faces are properly aligned. The goal of face alignment is to establish correspondence among different faces so that the subsequent image analysis tasks can be performed on a common basis.
The main challenge in face alignment arises from pervasive ambiguities in low-level image features. Consider the examples shown in Fig. 2. While the main face structures are present in the feature maps, the contours of face components are frequently disrupted by gaps or corrupted by spurious fragments. Strong gradient responses could be due to reflectance, occlusion, fine facial texture, or background clutter. In contrast, the boundaries of face components such as nose and eyebrow are often obscure and incomplete. Looking for face components separately is difficult and often yields noisy results.
Rather than searching individual face components and expecting the face structure to emerge from the results, a better strategy is imposing the structure explicitly from the beginning. A majority of work in the field is developed based on this strategy. Deformable template [1], for example, is an elastic model which resembles face structure by assemblies of flexible curves. A set of model parameters control shape details such as the locations of various facial subparts and the angles of hinges which join them. The model is imposed upon and aligned to an image by varying the parameters. This strategy is powerful for resolving low-level image ambiguities. Inspired by this work, many variations of deformable face models emerged, including [2–9]. The common scheme in this work is first to construct a generic face model, then modify it to match the facial features found in a particular image. In this procedure, encoding prior knowledge of human faces, collecting image evidences of facial features, and fusing the observations with priors are the three key problems. Our treatment will follow the method proposed by Gu and Takeo [8, 9], which addresses the above problems in a coherent hierarchical Bayes framework.
Constructing Face Priors
This article concerns the prior knowledge of a particular kind, namely, shape priors. Suppose that a face consists of a set of landmark points, which are typically placed along the boundaries of face components, i.e., S = (x1, y1, …, x n , y n ). It can be viewed as a random vector, and its distribution, commonly called shape prior, describes the plausible spatial configurations of the landmark set. A principled way to construct the prior is by learning the distribution from training samples.
Faces appear in different scales and orientations. First, we need to transform all training face images into a common coordinate frame. One popular approach is generalized Procrustes analysis [10]. It consists of two recursive steps: computing the mean shape and aligning each training shape with the mean by a rigid transformation. These two steps are repeated until the differences between the mean and the training shapes are minimized.
Next, we construct shape priors from the aligned training samples. The spatial arrangement of facial landmarks, although deformable, has to satisfy certain constraints. For example, it is often reasonable to assume that face shape is normally distributed; therefore, to learn the distribution, we simply compute the mean and the covariance of the training shapes. More specifically, since the intrinsic variability of face structure is independent of its representation, e.g., the number of landmarks, we can parameterize face shape in a low-dimensional subspace [6, 8], such as
The columns of Φ denote the major “modes” of shape deformations, and the elements of b control the magnitude of deformation on the corresponding mode. This model has a nice generative interpretation: The shape vector S is generated by first adding a sequence of deformations {Φ i b i } into the mean shape μ, then permuting the resultant shape by a Gaussian noise \(\epsilon \sim \mathcal{N}\left (0,\sigma ^{2}\right )\) From a geometric perspective, the matrix Φ span a low-dimensional subspace which is centered at μ, the deformation coefficient b is the projection of S in the subspace, and ε denotes the deviation of S from the subspace. Assuming the elements of b to be independently normal, i.e., \(b \sim \mathcal{N}\left (0,\varSigma \right )\) and Σ is diagonal, the distribution over the shape S is a constrained Gaussian, \(S \sim \mathcal{N}\left (\mu,\varPhi \varSigma \varPhi ^{t} +\sigma ^{2}I\right )\). The model parameters μ, Φ, Σ, and σ can be learned from training data. This model is also known as probabilistic principal component analysis [11] in the field of machine learning.
Detecting Facial Features
Strong gradient response is not the only way to characterize facial features. Some feature points may correspond to a weaker secondary edge in a local context instead of the strongest; other points such as eye corners may have rich image structure that is more informative than gradient magnitude. Facial feature modeling can be made more effective by constructing detectors specific to each individual feature. One simple detector [2], for example, is a normal distribution built on the local gradient structures of each point. The distribution is learned from training face images and applied to evaluate the target image. Concatenating the best candidate position (u i , v i ) of each feature point, we obtain an “observation” Q = (u1, v1, …, u n , v n ) of the face shape that is likely to be present in the image. The observation is related to the aligned shape S by a rigid transformation
where θ = {t, s, r} denotes the transformation parameters (translation, scale, and rotation) and η is an additive observation noise. The conditional \(p(Q\ \vert \ S)\) remains to be normal if the transformation \(\mathcal{T}\) is linear, e.g., rigid or affine. More sophisticated detectors have been developed to produce better observations; however, after decades of research, people have learned that individual feature detectors are effective only up to a point and cannot be expected to retrieve the entire face structure.
Fusing Priors with Image Observations
Combining the deformation model (1) with the transformation model (2), a hierarchical Bayes model is established that simulates how a random observation Q is generated from the deformation magnitude b and the transformation parameters θ. In this framework, the face alignment task is to modify shape priors to take into account the image evidences, arriving at the target face shape in images. EM algorithm is typically used for inferring the posterior b and θ, and analytic solutions exist for both E and M steps when the transformation is linear. This framework has been extended to model three-dimensional transformations for aligning multiview faces [8] and nonlinear shape deformations for dealing with face images with exaggerated facial expressions [9]. Figure 3 shows a few alignment results from [9].
Summary
Significant progresses have been made in face alignment in recent years. The hierarchical Bayes formulation introduced in this article provides a systematic way to resolve low-level image ambiguities and exploit prior knowledge. Face alignment has a wide range of applications including face recognition, expression analysis, facial animation, lip reading, and human-computer interaction.
Related Entries
References
A.L. Yuille, P.W. Hallinan, D.S. Cohen, Feature extraction from faces using deformable templates. Int. J. Comput. Vis. 8(2), 99–111 (1992). doi:http://dx.doi.org/10.1007/BF00127169. http://www.stat.ucla.edu/~yuille/pubs/optimize_papers/DT_IJCV1992.pdf
T.F. Cootes, C. Taylor, D. Cooper, J. Graham, Active shape models – their training and their applications. Comput. Vis. Image Underst. 61, 38–59 (1995)
L. Wiskott, J.M. Fellous, N. Kruger, C. von der Malsburg, Face recognition by elastic bunch graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 775–779 (1997). doi:http://dx.doi.org/10.1109/34.598235. http://www.face-rec.org/algorithms/EBGM/WisFelKrue99-FaceRecognition-JainBook.pdf
V. Blanz, T. Vetter, A morphable model for the synthesis of 3d-faces, in ACM SIGGRAPH, Los Angeles, 1999
T. Cootes, G. Edwards, C. Taylor, Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Y. Zhou, L. Gu, H. Zhang, Bayesian tangent shape model: estimating shape and pose parameters via Bayesian inference, in CVPR, Madison, vol. I, 2003, pp. 109–116. http://www.cs.cmu.edu/~gu/publication/alignment_cvpr03.pdf
Z. Zhang, Z. Liu, D. Adler, M.F. Cohen, E. Hanson, Y. Shan, Robust and rapid generation of animated faces from video images – a model-based modeling approach. Int. J. Comput. Vis. 58, 93–119 (2004)
L. Gu, T. Kanande, 3d alignment of face in a single image, in CVPR, New York, 2006
L. Gu, T. Kanade, A generative shape regularization model for robust face alignment, in The Tenth European Conference on Computer Vision, Marseille, 2008
C. Goodall, Procrustes methods in the statistical analysis of shape. J. R. Stat. Soc. Ser. B (Methodol.) 53, 285–339 (1991)
M. Jipping, C. Bishop, Probabilistic principal component analysis. J. R. Stat. Soc. 61, 611–622 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this entry
Cite this entry
Gu, L., Kanade, T. (2015). Face Alignment. In: Li, S.Z., Jain, A.K. (eds) Encyclopedia of Biometrics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7488-4_186
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7488-4_186
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7487-7
Online ISBN: 978-1-4899-7488-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering