Skip to main content
Log in

Spatial alignment network for facial landmark localization

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Facial Landmark Localization (FLL) on unconstrained images still remains challenging as they poses complex variation in face spatial structure and appearance. To address this problem, we propose a Spatial Alignment Network (SAN), which consist of two modules, like the transformation sub-network and the estimation sub-network. In the first module, we propose two methods to achieving spatial transformation, one is the handcrafted method which can ensure model stability and the other is the learning-based method which is efficient and flexible. In the second module, we add an attention layer in the deep CNN to enhance the importance of discriminative features and obtain more accurate results. Through extensive experiments, our model achieves good performance on several public challenging datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

Similar content being viewed by others

References

  1. Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3444–3451 (2013)

  2. Bartz, C., Yang, H., Meinel, C.: Stn-ocr: a single neural network for text detection and text recognition (2017)

  3. Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)

    Article  Google Scholar 

  4. Cai, Q., Gallup, D., Zhang, C., Zhang, Z.: 3d deformable face tracking with a commodity depth camera. In: Computer Vision - ECCV 2010, European Conference on Computer Vision, pp 229–242. Proceedings, Heraklion (2010)

  5. Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2887–2894 (2012)

  6. Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3640–3649 (2016)

  7. Chu, Q., Ouyang, W., Li, H., Wang, X, Liu, B., Yu, N.: Online multi-object tracking using cnn-based single object tracker with spatial-temporal attention mechanism (2017)

  8. Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A.L., Wang, X.: Multi-context attention for human pose estimation (2017)

  9. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: European Conference on Computer Vision, pp. 484–498 (1998)

  10. Dollar, P., Welinder, P., Perona, P.: Cascaded pose regression. IEEE 238 (6), 1078–1085 (2010)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  12. Jourabloo, A., Liu, X.: Pose-invariant 3d face alignment. In: IEEE International Conference on Computer Vision, pp. 3694–3702 (2016)

  13. Kingma, D.P., Adam, J.B.a.: A method for stochastic optimization. Computer Science (2014)

  14. Kowalski, M., Naruniec, J., Trzcinski, T: Deep alignment network: A convolutional neural network for robust face alignment, pp. 2034–2043 (2017)

  15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp 1097–1105 (2012)

  16. Li, H., Li, Y., Liu, W., Dong, H.: Coarse-to-fine facial landmarks localization based on convolutional feature. In: 2017 International Conference on Behavioral, Economic, Socio-cultural Computing (BESC), pp. 1–6 (2017)

  17. Li, Y., Chang, M.-C., Farid, H., Lyu, S.: In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking. arXiv:1806.02877 (2018)

  18. Lin, C.H., Lucey, S.: Inverse compositional spatial transformer networks, pp. 2252–2260 (2016)

  19. Liu, Y., Jourabloo, A., Liu, X.: Learning deep models for face antispoofng: binary or auxiliary supervision (2018)

  20. Lv, J., Shao, X., Xing, J., Cheng, C., Zhou, X.: A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3691–3700 (2017)

  21. Mo, K.: Spatial transformer network

  22. Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886 (2012)

  23. Ranjan, R., Patel, V.M., Chellappa, R.: Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1–1 (2016)

    Google Scholar 

  24. Rashid, M., Gu, X., Yong, J.L.: Interspecies knowledge transfer for facial keypoint detection (2017)

  25. Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: A semi-automatic methodology for facial landmark annotation. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 896–903 (2013)

  26. Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge The first facial landmark localization challenge. In: IEEE International Conference on Computer Vision Workshops, pp. 397–403 (2014)

  27. Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M: 300 faces in-the-wild challenge: database and results. Image Vis. Comput. 47, 3–18 (2016)

    Article  Google Scholar 

  28. Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable Model Fitting by Regularized Landmark Mean-Shift. Kluwer Academic Publishers, Netherlands (2010)

    MATH  Google Scholar 

  29. Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013)

  30. Trigeorgis, G., Snape, P., Nicolaou, M.A., Antonakos, E., Zafeiriou, S.: Mnemonic descent method: a recurrent process applied for end-to-end face alignment. In: Computer Vision and Pattern Recognition (2016)

  31. Tuzel, O., Marks, T.K., Tambe, S.: Robust face alignment using a mixture of invariant experts. In: European Conference on Computer Vision, pp. 825–841 (2016)

  32. Xie, S., Girshick, R., Dollar, P, Tu, Z., He, K.: Aggregated residual transformations for deep neural networks (2016)

  33. Xiong, X., Torre, F.D.L.: Supervised descent method and its applications to face alignment. In: Computer Vision and Pattern Recognition, pp. 532–539 (2013)

  34. Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In: European Conference on Computer Vision, pp. 1–16 (2014)

  35. Zhang, Z., Luo, P., Chen, C.L., Tang, X.: Facial landmark detection by deep multi-task learning. In: European Conference on Computer Vision, pp. 94–108 (2014)

  36. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

Download references

Acknowledgements

This work is supported by National Science Foundation of China Grant #61672088 and #61790575, Fundamental Research Funds for the Central Universities #2018JBZ002. The corresponding author is Yidong Li.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yidong Li.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Li, Y., Xing, J. et al. Spatial alignment network for facial landmark localization. World Wide Web 22, 1481–1498 (2019). https://doi.org/10.1007/s11280-018-0615-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-018-0615-9

Keywords

Navigation