Skip to main content

Recurrent Convolutional Face Alignment

  • Conference paper
  • First Online:
Book cover Computer Vision – ACCV 2016 (ACCV 2016)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10112))

Included in the following conference series:

Abstract

Mainstream direction in face alignment is now dominated by cascaded regression methods. These methods start from an image with an initial shape and build a set of shape increments by computing features with respect to the current shape estimate. These shape increments move the initial shape to the desired location. Despite the advantages of the cascaded methods, they all share two major limitations: (i) shape increments are learned separately from each other in a cascaded manner, (ii) the use of standard generic computer vision features such SIFT, HOG, does not allow these methods to learn problem-specific features. In this work, we propose a novel Recurrent Convolutional Face Alignment method that overcomes these limitations. We frame the standard cascaded alignment problem as a recurrent process and learn all shape increments jointly, by using a recurrent neural network with the gated recurrent unit. Importantly, by combining a convolutional neural network with a recurrent one we alleviate hand-crafted features, widely adopted in the literature and thus allowing the model to learn task-specific features. Moreover, both the convolutional and the recurrent neural networks are learned jointly. Experimental evaluation shows that the proposed method has better performance than the state-of-the-art methods, and further support the importance of learning a single end-to-end model for face alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.humansensing.cs.cmu.edu/intraface/index.php http://www.zface.org/

References

  1. Cootes, T.F., Taylor, C.J.: Active shape models - ‘smart snakes’. In: Hogg, D., Boyle, R. (eds.) BMVC 1992. Springer, Heidelberg (1992)

    Google Scholar 

  2. Cootes, T.F., Taylor, C.J.: Active shape model search using local grey-level models: a quantitative evaluation. In: BMVC (1993)

    Google Scholar 

  3. Cootes, T.F., Edwards, G.J., Taylor, C.J.: TPAMI. Active appearance models 23, 681–685 (2001)

    Google Scholar 

  4. Cao, C., Weng, Y., Lin, S., Zhou, K.: 3D shape regression for real-time facial animation. In: SIGGRAPH (2013)

    Google Scholar 

  5. Xiong, X., De La Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR (2013)

    Google Scholar 

  6. Yang, H., Patras, I.: Sieving regression forest votes for facial feature detection in the wild. In: ICCV, pp. 1936–1943 (2013)

    Google Scholar 

  7. Tzimiropoulos, G.: Project-out cascaded regression with an application to face alignment. In: CVPR (2015)

    Google Scholar 

  8. Zhu, S., Li, C., Change, C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: CVPR (2015)

    Google Scholar 

  9. Xiong, X., Torre, F.D.: Global supervised descent method. In: CVPR (2015)

    Google Scholar 

  10. Tulyakov, S., Sebe, N.: Regressing a 3D face shape from a single image. In: ICCV (2015)

    Google Scholar 

  11. Kazemi, V., Josephine, S.: One millisecond face alignment with an ensemble of regression trees. In: CVPR (2014)

    Google Scholar 

  12. Jeni, L.A., Cohn, J.F., Kanade, T.: Dense 3D face alignment from 2D videos in real-time. In: FG (2015)

    Google Scholar 

  13. Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 FPS via regressing local binary features. In: CVPR (2014)

    Google Scholar 

  14. Doll, P., Pietro, W., Perona, P.: Cascaded pose regression. In: CVPR (2010)

    Google Scholar 

  15. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)

    Google Scholar 

  16. Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV (1999)

    Google Scholar 

  17. Wang, W., Yan, Y., Winkler, S., Sebe, N.: Category specific dictionary learning for attribute specific feature selection. TIP 25, 1465–1478 (2016)

    MathSciNet  Google Scholar 

  18. Wang, W., Yan, Y., Sebe, N.: Attribute guided dictionary learning. In: ICMR (2015)

    Google Scholar 

  19. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)

    Google Scholar 

  20. Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: NIPS (2013)

    Google Scholar 

  21. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv (2014)

    Google Scholar 

  22. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC (2014)

    Google Scholar 

  23. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)

    Google Scholar 

  24. Auli, M., Galley, M., Quirk, C., Zweig, G.: Joint language and translation modeling with recurrent neural networks. In: EMNLP (2013)

    Google Scholar 

  25. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv (2014)

    Google Scholar 

  26. Pinheiro, P.H.O., Collobert, R.: Recurrent convolutional neural networks for scene parsing. arXiv (2013)

    Google Scholar 

  27. Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: CVPR (2015)

    Google Scholar 

  28. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: AAAI (2015)

    Google Scholar 

  29. Wang, N., Gao, X., Tao, D., Li, X.: Facial feature point detection: a comprehensive survey. arXiv (2014)

    Google Scholar 

  30. Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. IJCV 91, 200–215 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  31. Baltrusaitis, T., Robinson, P., Morency, L.P.: 3D constrained local model for rigid and non-rigid facial tracking. In: CVPR (2012)

    Google Scholar 

  32. Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: ICCV (2013)

    Google Scholar 

  33. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. TPAMI 23(6), 681–685 (2001)

    Article  Google Scholar 

  34. Gross, R., Matthews, I., Baker, S.: Generic vs. person specific active appearance models. IVC 23, 1080–1093 (2005)

    Google Scholar 

  35. Tzimiropoulos, G., Pantic, M.: Optimization problems for fast aam fitting in-the-wild. In: ICCV (2013)

    Google Scholar 

  36. Fanelli, G., Dantone, M., Van Gool, L.: Real time 3D face alignment with random forests-based active appearance models. In: FG (2013)

    Google Scholar 

  37. Zhou, S.K., Comaniciu, D.: Shape regression machine. In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 13–25. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73273-0_2

    Chapter  Google Scholar 

  38. Cao, X.: Face alignment by explicit shape regression. In: CVPR (2012)

    Google Scholar 

  39. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)

    Article  Google Scholar 

  40. Tulyakov, S., Alameda-Pineda, X., Ricci, E., Yin, L., Cohn, J.F., Sebe, N.: Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions. In: CVPR (2016)

    Google Scholar 

  41. Cao, C., Weng, Y., Zhou, S., Tong, Y., Zhou, K.: Facewarehouse: a 3D facial expression database for visual computing. TVCG 20, 413–425 (2014)

    Google Scholar 

  42. Jourabloo, A., Liu, X.: Pose-invariant 3D face alignment. In: ICCV (2015)

    Google Scholar 

  43. Tulyakov, S., Vieriu, R.L., Semeniuta, S., Sebe, N.: Robust real-time extreme head pose estimation. In: International Conference on Pattern Recognition (2014)

    Google Scholar 

  44. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. TSP 45, 2673–2681 (1997)

    Google Scholar 

  45. Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence-video to text. In: ICCV (2015)

    Google Scholar 

  46. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR (2015)

    Google Scholar 

  47. Wang, W., Cui, Z., Yan, Y., Feng, J., Yan, S., Shu, X., Sebe, N.: Recurrent face aging. In: CVPR, pp. 2378–2386 (2016)

    Google Scholar 

  48. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: ICASSP (2013)

    Google Scholar 

  49. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  50. Koutnik, J., Greff, K., Gomez, F., Schmidhuber, J.: A clockwork RNN. arXiv (2014)

    Google Scholar 

  51. Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: ICML (2015)

    Google Scholar 

  52. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 184–199. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10593-2_13

    Google Scholar 

  53. Liang, X., Liu, S., Shen, X., Yang, J., Liu, L., Dong, J., Lin, L., Yan, S.: Deep human parsing with active template regression. TPAMI 37, 2402–2414 (2015)

    Article  Google Scholar 

  54. Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: ICCV Workshops (2013)

    Google Scholar 

  55. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR (2012)

    Google Scholar 

  56. Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. TPAMI 35, 2930–2940 (2013)

    Article  Google Scholar 

  57. Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33712-3_49

    Chapter  Google Scholar 

  58. Messer, K., Matas, J., Kittler, J., Luettin, J., Maitre, G.: XM2VTSDB: the extended M2VTS database. In: Second International Conference on Audio and Video-based Biometric Person Authentication (1999)

    Google Scholar 

  59. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: CVPR (2013)

    Google Scholar 

  60. Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. IJCV 107, 177–190 (2014)

    Article  MathSciNet  Google Scholar 

  61. Burgos-Artizzu, X., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: ICCV (2013)

    Google Scholar 

  62. Smith, B., Brandt, J., Lin, Z., Zhang, L.: Nonparametric context modeling of local appearance for pose-and expression-robust facial landmark localization. In: CVPR (2014)

    Google Scholar 

  63. Zhao, X., Kim, T.K., Luo, W.: Unified face analysis by iterative multi-output random forests. In: CVPR (2014)

    Google Scholar 

  64. Tzimiropoulos, G., Pantic, M.: Gauss-Newton deformable part models for face alignment in-the-wild. In: CVPR (2014)

    Google Scholar 

  65. Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 1–16. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10605-2_1

    Google Scholar 

  66. Wang, W., Yan, Y., Nie, L., Zhang, L., Winkler, S., Sebe, N.: Sparse code filtering for action pattern mining. In: ACCV (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wang, W., Tulyakov, S., Sebe, N. (2017). Recurrent Convolutional Face Alignment. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10112. Springer, Cham. https://doi.org/10.1007/978-3-319-54184-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54184-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54183-9

  • Online ISBN: 978-3-319-54184-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics