Abstract
Most existing algorithms based on Convolutional Neural Networks (CNNs) for face alignment ignore the significance of attention mechanism. In this paper, we propose a Multi-Attention Network (MANet) for robust face alignment. Our attention mechanism includes multi-level feature attention and multi-scale attention. Multi-level feature attention is introduced for the purpose of paying attention to features of different levels, specifically, high-level feature attentions are essential for correlations among neighboring regions whereas low-level feature attentions focus on detailed description for local parts. While multi-scale attention is designed to obtain better representation the features of different scales. The attentions mentioned above are utilized for better feature presentation and information flow, thus our network is guided to emphasize the key information and suppress the less significant information. The experimental results on 300 W and WFLW datasets demonstrate the superiority of the proposed method over the state-of-the-art approaches.
This work was supported in part by the National Nature Science Foundation of China under Grant 61372137 and in part by the Natural Science Foundation of Anhui Province, China, under Grant 1908085MF209.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Feng, Z., Kittler, J., Awais, M., et al.: Wing loss for robust facial landmark localisation with convolutional neural networks. In: Computer Vision and Pattern Recognition, pp. 2235–2245 (2018)
Wu, W., Qian, C., Yang, S., et al.: Look at boundary: a boundary-aware face alignment algorithm. In: Computer Vision and Pattern Recognition, pp. 2129–2138 (2018)
Li, S., Deng, W., Du, J., et al.: Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In: Computer Vision and Pattern Recognition, pp. 2584–2593 (2017)
Pantic, M., Rothkrantz, L.J.: Automatic analysis of facial expressions: the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1424–1445 (2000)
Trigeorgis, G., Snape, P., Nicolaou, M.A., et al.: Mnemonic descent method: a recurrent process applied for end-to-end face alignment. In: Computer Vision and Pattern Recognition, pp. 4177–4187 (2016)
Newell, A., Yang, K., Deng, J., et al.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp. 483–499 (2016)
Yang, J., Liu, Q., Zhang, K., et al.: Stacked hourglass network for robust facial landmark localisation. In: Computer Vision and Pattern Recognition, pp. 2025–2033 (2017)
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). In: International Conference on Computer Vision, pp. 1021–1030 (2017)
Lv, J., Shao, X., Xing, J., et al.: A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: Computer Vision and Pattern Recognition, pp. 3691–3700 (2017)
Valle, R, Buenaposada, J.M., Valdes, A., et al.: A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In: European Conference on Computer Vision, pp. 609–624 (2018)
Salisbury, D.F.: Cognitive psychology and its implications for designing drill and practice programs for computers. J. Comput. Based Instr. 17(1), 23–30 (1990)
He, X., Peng, Y.: Multi-attention guided activation propagation in CNNs. In: Lai, J.-H., et al. (eds.) PRCV 2018. LNCS, vol. 11257, pp. 16–27. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03335-4_2
Fu, J., Liu, J., Tian, H., et al.: Dual attention network for scene segmentation. arXiv preprint arXiv:1809.02983 (2018)
Chu, X., Yang, W., Ouyang, W., et al.: Multi-context attention for human pose estimation. In: Computer Vision and Pattern Recognition, pp. 5669–5678 (2017)
Huang, G., Liu, Z., Van Der Maaten, L., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Tang, Z., Peng, X., Geng, S., et al.: CU-net: coupled U-nets. arXiv preprint arXiv:1808.06521 (2018)
Mehta, S., Rastegari, M., Caspi, A., et al.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: European Conference on Computer Vision, pp. 561–580 (2018)
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., et al.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: International Conference on Computer Vision, pp. 397–403 (2013)
Cao, X., Wei, Y., Wen, F., et al.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107(2), 177–190 (2014)
Xiong, X., La Torre, F.D.: Supervised descent method and its applications to face alignment. In: Computer Vision and Pattern Recognition, pp. 532–539 (2013)
Zhu, S., Li, C., Loy, C.C., et al.: Face alignment by coarse-to-fine shape searching. In: Computer Vision and Pattern Recognition, pp. 4998–5006 (2015)
Wu, W., Yang, S.: Leveraging intra and inter-dataset variations for robust face alignment. In: Computer Vision and Pattern Recognition, pp. 2096–2105 (2017)
Burgosartizzu, X.P., Perona, P., Dollar, P., et al.: Robust face landmark estimation under occlusion. In: International Conference on Computer Vision, pp. 1513–1520 (2013)
Zhang, J., Shan, S., Kan, M., et al.: Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: European Conference on Computer Vision, pp. 1–16 (2014)
Ren, S., Cao, X., Wei, Y., et al.: Face alignment at 3000 FPS via regressing local binary features. In: Computer Vision and Pattern Recognition, pp. 1685–1692 (2014)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_7
Xiao, S., Feng, J., Xing, J., Lai, H., Yan, S., Kassim, A.: Robust facial landmark detection via recurrent attentive-refinement networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 57–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_4
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, X., Wang, H., Cheng, R., Yan, X., Tao, L. (2019). Multi-Attention Network for 2D Face Alignment in the Wild. In: Wang, Y., Huang, Q., Peng, Y. (eds) Image and Graphics Technologies and Applications. IGTA 2019. Communications in Computer and Information Science, vol 1043. Springer, Singapore. https://doi.org/10.1007/978-981-13-9917-6_24
Download citation
DOI: https://doi.org/10.1007/978-981-13-9917-6_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9916-9
Online ISBN: 978-981-13-9917-6
eBook Packages: Computer ScienceComputer Science (R0)