Abstract
For years, the so-called Constrained Local Model (CLM) and its variants have been the gold standard in face alignment tasks. The CLM combines an ensemble of local feature detectors whose locations are regularized by a shape model. Fitting such a model typically consists of an exhaustive local search using the detectors and a global optimization that finds the CLM’s parameters that jointly maximize all the responses. However, one major drawback of CLMs is the inefficiency of the local search, which relies on a large amount of expensive convolutions. This paper introduces the Gradient Shape Model (GSM), a novel approach that addresses this limitation. We are able to align a similar CLM model without the need for any convolutions at all. We also use true analytical gradient and Hessian matrices, which are easy to compute, instead of their approximations. Our formulation is very general, allowing an optional 3D shape term to be seamlessly included. Additionally, we expand the GSM formulation through a cascade regression framework. This revised technique allows a substantially reduction in the complexity/dimensionality of the data term, making it possible to compute a denser, more accurate, regression step per cascade level. Experiments in several standard datasets show that our proposed models perform faster than state-of-the-art CLMs and better than recent cascade regression approaches.
Similar content being viewed by others
Notes
It is worth mentioning that some authors use a slightly different pose parametrization \((\varvec{ \theta }'= [a-1,\, b,\, t_x, \, t_y]^T)\) that allows to append to \(\Phi \) a special set of 4 eigenvectors that linearly model the 2D pose (Matthews and Baker 2004).
The LFW dataset was excluded due the lack of landmark annotations in the face outer region.
References
Akhter, I., Sheikh, Y., Khan, S., & Kanade, T. (2008). Nonrigid structure from motion in trajectory space. In Neural information processing systems.
Alabort-i-Medina, J., & Zafeiriou, S. (2017). A unified framework for compositional fitting of active appearance models. International Journal of Computer Vision, 121(1), 26–64.
Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M. (2013). Robust discriminative response map fitting with constrained local models. In IEEE conference on computer vision and pattern recognition.
Baker, S., & Matthews, I. (2001). Equivalence and efficiency of image alignment algorithms. In IEEE conference on computer vision and pattern recognition.
Baltrušaitis, T., Robinson, P., & Morency, L. (2013). Constrained local neural fields for robust facial landmark detection in the wild. In IEEE international conference on computer vision workshop, 300 faces in-the-wild challenge (300-W).
Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., & Kumar, N. (2011). Localizing parts of faces using a consensus of exemplars. In IEEE conference on computer vision and pattern recognition.
Boddeti, V. N., Kanade, T., & Kumar, B. V. K. V. (2013). Correlation filters for object alignment. In IEEE conference on computer vision and pattern recognition.
Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In IEEE conference on computer vision and pattern recognition.
Bulat, A., & Tzimiropoulos, G. (2017a). Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In IEEE international conference on computer vision.
Bulat, A., & Tzimiropoulos, G. (2017b). How far are we from solving the 2d and 3d face alignment problem? (and a dataset of 230,000 3d facial landmarks). In IEEE international conference on computer vision.
Burgos-Artizzu, X. P., Perona, P., & Dollár, P. (2013). Robust face landmark estimation under occlusion. In IEEE international conference on computer vision.
Cao, X., Wei, Y., Wen, F., & Sun, J. (2012). Face alignment by explicit shape regression. In IEEE conference on computer vision and pattern recognition.
Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Cootes, T. F., & Taylor, C. J. (2004). Statistical models of appearance for computer vision. Technical report, Imaging Science and Biomedical Engineering, University of Manchester.
Cootes, T. F., Taylor, C. J., Cooper, D. H., & Graham, J. (1995). Active shape models—Their training and application. Computer Vision and Image Understanding, 61(1), 38–59.
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.
Cootes, T. F., Ionita, M., Lindner, C., & Sauer, P. (2012). Robust and accurate shape model fitting using random forest regression voting. In European conference on computer vision.
Cristinacce, D., & Cootes, T. F. (2006). Feature detection and tracking with constrained local models. In British machine vision conference.
Cristinacce, D., & Cootes, T. F. (2007). Boosted regression active shape models. In British machine vision conference.
Cristinacce, D., & Cootes, T. F. (2008). Automatic feature localisation with constrained local models. Pattern Recognition, 41(10), 3054–3067.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE conference on computer vision and pattern recognition.
Dantone, M., Gall, J., Fanelli, G., & Gool, L. V. (2012). Real-time facial feature detection using conditional regression forests. In IEEE conference on computer vision and pattern recognition.
Face++. (2018). Megvii face API. http://www.faceplusplus.com.
Fan, H., & Zhou, E. (2016). Approaching human level facial landmark localization by deep learning. Image and Vision Computing, 47, 27–35.
Fanelli, G., Dantone, M., & Gool L. V. (2013). Real time 3d face alignment with random forests-based active appearance models. In IEEE international conference on automatic face and gesture recognition.
Galoogahi, H. K., Sim, T., & Lucey, S. (2013). Multi-channel correlation filters. In IEEE international conference on computer vision.
Gu, L., Kanade, T. (2008). A generative shape regularization model for robust face alignment. In European conference on computer vision.
Guo, P. (2014). https://github.com/phg1024/CSCE625/tree/master/finalproject.
Henriques, J. F., Carreira, J., Caseiro, R., Batista, J. (2013). Beyond hard negative mining: Efficient detector learning via block-circulant decomposition. In IEEE international conference on computer vision.
Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical report 07-49, University of Massachusetts, Amherst.
Huang, Z., Zhou, E., & Cao, Z. (2015). Coarse-to-fine face alignment with multi-scale local patch regression. arXiv:1511.04901.
Jacobs, H. O. (2014). How to stare at the higher-order n-dimensional chain rule without losing your marbles. Technical report. arXiv:1410.3493v3.
Jourabloo, A., & Liu, X. (2015). Pose-invariant 3d face alignment. In IEEE international conference on computer vision.
Kazemi, V., & Sullivan, J. (2014). One millisecond face alignment with an ensemble of regression trees. In IEEE conference on computer vision and pattern recognition.
Le, V., Brandt, J., Lin, Z., Boudev, L., & Huang, T. S. (2012). Interactive facial feature localization. In European conference on computer vision.
Lee, D., Park, H., & Yoo, C. D. (2015). Face alignment using cascade Gaussian process regression trees. In IEEE conference on computer vision and pattern recognition.
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Martins, P., Caseiro, R., Henriques, J. F., & Batista, J. (2012). Discriminative Bayesian active shape models. In European conference on computer vision.
Martins, P., Caseiro, R., & Batista, J. (2014). Non-parametric Bayesian constrained local models. In IEEE conference on computer vision and pattern recognition.
Martins, P., Henriques, J. F., Caseiro, R., & Batista, J. (2016). Bayesian constrained local models revisited. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4), 704–716.
Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(1), 135–164.
Messer, K., Matas, J., Kittler, J., Luettin, J., & Maitre, G. (1999). XM2VTSDB: The extended M2VTS database. In International conference on audio and video-based biometric person authentication.
Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In European conference on computer vision.
Nordstrom, M., Larsen, M., Sierakowski, J., & Stegmann, M. (2004). The IMM face database—An annotated dataset of 240 face images. Technical report. Technical University of Denmark, DTU.
Paquet, U. (2009). Convexity and Bayesian constrained local models. In IEEE conference on computer vision and pattern recognition.
Ren, S., Cao, X., Wei, Y., & Sun, J. (2014). Face alignment at 3000 fps via regressing local binary features. In IEEE conference on computer vision and pattern recognition.
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013a). 300 faces in-the-wild challenge: The first facial landmark localization challenge. In IEEE international conference on computer vision workshops.
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013b). A semi-automatic methodology for facial landmark annotation. In IEEE conference on computer vision and pattern recognition workshops
Sagonas, C., Antonakos, E., Tzimiropoulos, G., & Pantic, M. (2016). 300 faces in-the-wild challenge: Database and results. Image and Vision Computing, Special Issue on Facial Landmark Localisation ‘In-The-Wild’, 47, 3–18.
Sánchez-Lozano, E., Tzimiropoulos, G., Martinez, B., De la Torre, F., & Valstar, M. (2018). A functional regression approach to facial landmark tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2017.2745568.
Saragih, J., Lucey, S., & Cohn, J. (2009). Face alignment through subspace constrained mean-shifts. In IEEE international conference on computer vision.
Saragih, J., Lucey, S., & Cohn, J. (2010). Deformable model fitting by regularized landmark mean-shifts. International Journal of Computer Vision, 91(2), 200–215.
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.
Songsri-in, K., Trigeorgis, G., & Zafeiriou, S. (2018). Deep & deformable: Convolutional mixtures of deformable part-based models. In IEEE conference on automatic face and gesture recognition.
Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In IEEE conference on computer vision and pattern recognition.
Trigeorgis, G., Snape, P., Nicolaou, M., Antonakos, E., & Zafeiriou, S. (2016). Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In IEEE conference on computer vision and pattern recognition.
Tzimiropoulos, G. (2015). Project-out cascaded regression with an application to face alignment. In IEEE conference on computer vision and pattern recognition.
Tzimiropoulos, G., & Pantic, M. (2014). Gauss–Newton deformable part models for face alignment in-the-wild. In IEEE conference on computer vision and pattern recognition.
Tzimiropoulos, G., & Pantic, M. (2017). Fast algorithms for fitting active appearance models to unconstrained images. International Journal of Computer Vision, 122(1), 17–33.
Tzimiropoulos, G., i Medina, J. A., Zafeiriou, S., & Pantic, M. (2012). Generic active appearance models revisited. In Asian conference on computer vision.
Valstar, M. F., Martinez, B., Binefa, X., & Pantic, M. (2010). Facial point detection using boosted regression and graph models. In IEEE conference on computer vision and pattern recognition.
Viola, P., & Jones, M. (2002). Robust real-time object detection. International Journal of Computer Vision, 57(2), 137–154.
Wang, Y., Lucey, S., & Cohn, J. (2008). Enforcing convexity for improved alignment with constrained local models. In IEEE conference on computer vision and pattern recognition.
Xiao, J., Baker, S., Matthews, I., & Kanade, T. (2004a). Real-time combined 2d+3d active appearance models. In IEEE conference on computer vision and pattern recognition.
Xiao, J., Chai, J., & Kanade, T. (2004b). A closed-form solution to non-rigid shape and motion recovery. In European conference on computer vision.
Xiong, X., & De la Torre, F. (2013). Supervised descent method and its application to face alignment. In IEEE conference on computer vision and pattern recognition.
Xiong, X., & De la Torre, F. (2014). Supervised descent method for solving nonlinear least squares problems in computer vision. Technical report. arXiv:1405.0601.
Xiong, X., & De la Torre, F. (2015). Global supervised descent method. In IEEE conference on computer vision and pattern recognition.
Zadeh, A., Baltrušaitis, T., & Morency, L. P. (2017). Convolutional experts constrained local model for facial landmark detection. In IEEE computer vision and pattern recognition workshop (CVPRW), 2nd facial landmark localisation competition.
Zhang, J., Shan, S., Kan, M., & Chen, X. (2014a). Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In European conference on computer vision.
Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2014b). Facial landmark detection by deep multi-task learning. In European conference on computer vision.
Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2016). Learning deep representation for face alignment with auxiliary attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 918–930.
Zhou, E., Fan, H., Cao, Z., Jiang, Y., & Yin, Q. (2013a). Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In IEEE international conference on computer vision workshop, 300 faces in-the-wild challenge (300-W).
Zhou, F., Brandt, J., & Lin, Z. (2013b). Exemplar-based graph matching for robust facial landmark localization. In IEEE international conference on computer vision.
Zhu, S., Li, C., Loy, C., & Tang, X. (2015). Face alignment by coarse-to-fine shape searching. In IEEE conference on computer vision and pattern recognition.
Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In IEEE conference on computer vision and pattern recognition.
Acknowledgements
This work was supported by the Portuguese Science Foundation (FCT) through the Grant SFRH/BPD/90200/2012 (P. Martins).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ming-Hsuan Yang.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Gradient Definitions
Gradient Definitions
1.1 Hessian of the 2D Regularization Term (\(\mathbf{H}_{\text {R}}\))
The Hessian of the 2D regularization term is a \((2v+4)\) square matrix of the form:
where the main \((2v \times 2v)\) sub-matrix (constant, therefore can be precomputed) is
The 2D pose diagonal terms are given by
where \(\mathbf{0 }_v\) and \(\mathbf{1 }_v\) are v sized vectors filled with zeros and ones, respectively. In the previous, \(\mathbf{s}^x\) and \(\mathbf{s}^y\) represent the x and y components (v sized vectors) of the shape \(\mathbf{s}\). Additionally, note that \(\mathbf{s}_m\) (2v expanded vector that defines the base mesh centre of mass) is constant.
The 2D pose mixed terms are given by
Finally, the remaining mixed terms, are
where \(\varvec{\delta }_i\) is a v-dimensional vector filled with zeros, except on a scalar of 1 at the ith element location.
1.2 Gradients of the 2D \(+\) 3D Model
The gradient of the 3D regularization term, in Eq. 42, is
Recalling gradient of the 3D to 2D projection error, defined by a \((2v) \times (2v+4+3v+6)\) matrix as
with
where \(\mathbf{I}_{n}\) represents a n dimensional identity matrix and the \(\otimes \) symbol is the Kronecker product.
Finally, the Hessian of the 3D shape regularization term (in Eq. 43) is a \((2v+4+3v+6)\) square matrix given by
note that this matrix is constant, therefore, it can be precomputed.
Rights and permissions
About this article
Cite this article
Martins, P., Henriques, J.F. & Batista, J. Gradient Shape Model. Int J Comput Vis 128, 2828–2848 (2020). https://doi.org/10.1007/s11263-020-01341-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-020-01341-y