Abstract
Mainstream direction in face alignment is now dominated by cascaded regression methods. These methods start from an image with an initial shape and build a set of shape increments by computing features with respect to the current shape estimate. These shape increments move the initial shape to the desired location. Despite the advantages of the cascaded methods, they all share two major limitations: (i) shape increments are learned separately from each other in a cascaded manner, (ii) the use of standard generic computer vision features such SIFT, HOG, does not allow these methods to learn problem-specific features. In this work, we propose a novel Recurrent Convolutional Face Alignment method that overcomes these limitations. We frame the standard cascaded alignment problem as a recurrent process and learn all shape increments jointly, by using a recurrent neural network with the gated recurrent unit. Importantly, by combining a convolutional neural network with a recurrent one we alleviate hand-crafted features, widely adopted in the literature and thus allowing the model to learn task-specific features. Moreover, both the convolutional and the recurrent neural networks are learned jointly. Experimental evaluation shows that the proposed method has better performance than the state-of-the-art methods, and further support the importance of learning a single end-to-end model for face alignment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cootes, T.F., Taylor, C.J.: Active shape models - ‘smart snakes’. In: Hogg, D., Boyle, R. (eds.) BMVC 1992. Springer, Heidelberg (1992)
Cootes, T.F., Taylor, C.J.: Active shape model search using local grey-level models: a quantitative evaluation. In: BMVC (1993)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: TPAMI. Active appearance models 23, 681–685 (2001)
Cao, C., Weng, Y., Lin, S., Zhou, K.: 3D shape regression for real-time facial animation. In: SIGGRAPH (2013)
Xiong, X., De La Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR (2013)
Yang, H., Patras, I.: Sieving regression forest votes for facial feature detection in the wild. In: ICCV, pp. 1936–1943 (2013)
Tzimiropoulos, G.: Project-out cascaded regression with an application to face alignment. In: CVPR (2015)
Zhu, S., Li, C., Change, C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: CVPR (2015)
Xiong, X., Torre, F.D.: Global supervised descent method. In: CVPR (2015)
Tulyakov, S., Sebe, N.: Regressing a 3D face shape from a single image. In: ICCV (2015)
Kazemi, V., Josephine, S.: One millisecond face alignment with an ensemble of regression trees. In: CVPR (2014)
Jeni, L.A., Cohn, J.F., Kanade, T.: Dense 3D face alignment from 2D videos in real-time. In: FG (2015)
Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 FPS via regressing local binary features. In: CVPR (2014)
Doll, P., Pietro, W., Perona, P.: Cascaded pose regression. In: CVPR (2010)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)
Wang, W., Yan, Y., Winkler, S., Sebe, N.: Category specific dictionary learning for attribute specific feature selection. TIP 25, 1465–1478 (2016)
Wang, W., Yan, Y., Sebe, N.: Attribute guided dictionary learning. In: ICMR (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: NIPS (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv (2014)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)
Auli, M., Galley, M., Quirk, C., Zweig, G.: Joint language and translation modeling with recurrent neural networks. In: EMNLP (2013)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv (2014)
Pinheiro, P.H.O., Collobert, R.: Recurrent convolutional neural networks for scene parsing. arXiv (2013)
Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: CVPR (2015)
Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI (2015)
Wang, N., Gao, X., Tao, D., Li, X.: Facial feature point detection: a comprehensive survey. arXiv (2014)
Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. IJCV 91, 200–215 (2011)
Baltrusaitis, T., Robinson, P., Morency, L.P.: 3D constrained local model for rigid and non-rigid facial tracking. In: CVPR (2012)
Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: ICCV (2013)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. TPAMI 23(6), 681–685 (2001)
Gross, R., Matthews, I., Baker, S.: Generic vs. person specific active appearance models. IVC 23, 1080–1093 (2005)
Tzimiropoulos, G., Pantic, M.: Optimization problems for fast aam fitting in-the-wild. In: ICCV (2013)
Fanelli, G., Dantone, M., Van Gool, L.: Real time 3D face alignment with random forests-based active appearance models. In: FG (2013)
Zhou, S.K., Comaniciu, D.: Shape regression machine. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 13–25. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73273-0_2
Cao, X.: Face alignment by explicit shape regression. In: CVPR (2012)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)
Tulyakov, S., Alameda-Pineda, X., Ricci, E., Yin, L., Cohn, J.F., Sebe, N.: Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions. In: CVPR (2016)
Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. TVCG 20, 413–425 (2014)
Jourabloo, A., Liu, X.: Pose-invariant 3D face alignment. In: ICCV (2015)
Tulyakov, S., Vieriu, R.L., Semeniuta, S., Sebe, N.: Robust real-time extreme head pose estimation. In: International Conference on Pattern Recognition (2014)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. TSP 45, 2673–2681 (1997)
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: ICCV (2015)
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR (2015)
Wang, W., Cui, Z., Yan, Y., Feng, J., Yan, S., Shu, X., Sebe, N.: Recurrent face aging. In: CVPR, pp. 2378–2386 (2016)
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: ICASSP (2013)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Koutnik, J., Greff, K., Gomez, F., Schmidhuber, J.: A clockwork RNN. arXiv (2014)
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: ICML (2015)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10593-2_13
Liang, X., Liu, S., Shen, X., Yang, J., Liu, L., Dong, J., Lin, L., Yan, S.: Deep human parsing with active template regression. TPAMI 37, 2402–2414 (2015)
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: ICCV Workshops (2013)
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR (2012)
Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. TPAMI 35, 2930–2940 (2013)
Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33712-3_49
Messer, K., Matas, J., Kittler, J., Luettin, J., Maitre, G.: XM2VTSDB: the extended M2VTS database. In: Second International Conference on Audio and Video-based Biometric Person Authentication (1999)
Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: CVPR (2013)
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. IJCV 107, 177–190 (2014)
Burgos-Artizzu, X., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: ICCV (2013)
Smith, B., Brandt, J., Lin, Z., Zhang, L.: Nonparametric context modeling of local appearance for pose-and expression-robust facial landmark localization. In: CVPR (2014)
Zhao, X., Kim, T.K., Luo, W.: Unified face analysis by iterative multi-output random forests. In: CVPR (2014)
Tzimiropoulos, G., Pantic, M.: Gauss-Newton deformable part models for face alignment in-the-wild. In: CVPR (2014)
Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 1–16. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10605-2_1
Wang, W., Yan, Y., Nie, L., Zhang, L., Winkler, S., Sebe, N.: Sparse code filtering for action pattern mining. In: ACCV (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wang, W., Tulyakov, S., Sebe, N. (2017). Recurrent Convolutional Face Alignment. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-54184-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54183-9
Online ISBN: 978-3-319-54184-6
eBook Packages: Computer ScienceComputer Science (R0)