Abstract
Unconstrained face alignment is still a challenging problem due to the large poses, partial occlusions and complicated illuminations. To address these issues, in this paper, we propose a mixed attention hourglass network (MAttHG) to learn more discriminative representations by modeling the correlated relationships between features. Specifically, by integrating the attention module from features of different levels in the stacked hourglass networks, MAttHG can capture rich contextual correlations, which can be further used to combine local features to better model the spatial position relationship of facial landmarks. Furthermore, by combining the hourglass network and the attention module, MAttHG can effectively model the global attention and local attention to enhance the facial shape constraints for robust face alignment. Moreover, a head pose prediction module is designed to adaptively adjust the weight of each sample in the training set and redefine the loss function for addressing the problem of data imbalance. Experimental results on challenging benchmark datasets demonstrate the superiority of our MAttHG over state-of-the-art face alignment methods.












Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Masi I, Rawls S, Medioni G, Natarajan P (2016) Pose-aware face recognition in the wild. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4838–4846
Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6738–6746
Wang X, Zeng W, Zheng H, Dan T, Sheng J (2020) A two-step feature extraction algorithm for face recognition. In: ICCDE 2020: 2020 the 6th international conference on computing and data engineering
Kobayashi H, Hara F (2019) The recognition of basic facial expressions by neural network. In: Proceedings 1991 IEEE international joint conference on neural networks
Dou P, Shah SK, Kakadiaris IA (2017) End-to-end 3d face reconstruction with deep neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1503–1512
Sharma S, Kumar V (2020) Voxel-based 3d face reconstruction and its application to face recognition using sequential deeplearning. Multimed Tools Appl 79:17303–17330
Yi S, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR)
Zhou E, Fan H, Cao Z, Jiang Y, Yin Q (2013) Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: 2013 IEEE international conference on computer vision workshops, pp 386–391
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision
Ablavatski A, Lu S, Cai J (2017) Enriched deep recurrent visual attention model for multiple object recognition. In: IEEE winter conference on applications of computer vision
Reynolds GD, Richards JE (2019) Infant visual attention and stimulus repetition effects on object recognition. Child Dev 90(4):1027–1042
Gao P, Lu K, Xue J, Shao L, Lyu J (2020) A coarse-to-fine facial landmark detection method based on self-attentionmechanism. IEEE Trans Multimed 23:926–938
He C, Hu H (2019) Image captioning with visual-semantic double attention. ACM Trans Multimed Comput Commun Appl 15(1):26.1-26.16
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning
Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: 2013 IEEE international conference on computer vision workshops, pp 397–403
Burgos-Artizzu XP, Perona P, Dollár P (2013) Robust face landmark estimation under occlusion. In: 2013 IEEE international conference on computer vision, pp 1513–1520
Wu W, Qian C, Yang S, Wang Q, Cai Y, Zhou Q (2018) Look at boundary: a boundary-aware face alignment algorithm. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 2129–2138
Huang Y, Liu Q, Metaxas DN (2011) A component-based framework for generalized face alignment. IEEE Trans Syst Man Cybern Part B Cybern 41(1):287–298
Hubo C, Feris R, Turk M (2003) Active wavelet networks for face alignment. In: British machine vision conference
Wan J, Li J, Chang J, Wu Y (2018) Face alignment by coarse-to-fine shape estimation. Chin J Electron 27:1183–1191
Cao X, Wei Y, Wen F, Sun J (2014) Face alignment by explicit shape regression. Int J Comput Vis 107:177–190
Wan J, Li J, Chang J (2018) Face alignment on local-shape-based combined model. Chin J Comput 41:2162–2174
Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: 2013 IEEE conference on computer vision and pattern recognition, pp 532–539
Jie Z, Shan S, Kan M, Chen X (2014) Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In: European conference on computer vision
Cootes T, Edwards G, Taylor C (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685
Cristinacce D, Cootes TF (2006) Feature detection and tracking with constrained local models. In: Proceedings of the British machine vision conference 2006, Edinburgh, UK, Sept 4–7, 2006, pp 95.1–95.10
Ghiasi G, Fowlkes CC (2014) Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In: 2014 IEEE conference on computer vision and pattern recognition, pp 1899–1906
Xing J, Niu Z, Huang J, Hu W, Yan S (2014) Towards multi-view and partially-occluded face alignment. In: 2014 IEEE conference on computer vision and pattern recognition, pp 1829–1836
Lv J, Shao X, Xing J, Cheng C, Zhou X (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3691–3700
Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3d solution. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 146–155
Feng Y, Fan W, Shao X, Wang Y, Zhou X (2018) Joint 3D face reconstruction and dense alignment with position map regression network. In: Proceedings, Part XIV, 15th European conference. Munich, Germany, pp 8–14
Wan J, Li J, Lai Z, Du B, Zhang L (2020) Robust face alignment by cascaded regression and de-occlusion. Neural Netw 123:261–272
Guo X, Li S, Zhang J, Ma J, Ling H (2019) Pfld: a practical facial landmark detector. arXiv:1902.10859
Chu X, Yang W, Ouyang W, Ma C, Yuille A, Wang X (2017) Multi-context attention for human pose estimation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5669–5678
Liu B, Ferrari V (2017) Active learning for human pose estimation. In: 2017 IEEE international conference on computer vision (ICCV), pp 4373–4382
Ren S, Cao X, Wei Y, Sun J (2016) Face alignment via regressing local binary features. IEEE Trans Image Process 25(3):1233–1245
Shizhan Zhu, Cheng Li, Loy CC, Tang X (2015) Face alignment by coarse-to-fine shape searching. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4998–5006
Zhang Z, Ping L, Chen CL, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision
Trigeorgis G, Snape P, Nicolaou MA, Antonakos E, Zafeiriou S (2016) Mnemonic descent method: a recurrent process applied for end-to-end face alignment. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4177–4187
Xiao S, Feng J, Xing J, Lai H, Yan S, Kassim AA (2016) Robust facial landmark detection via recurrent attentive-refinement networks. In: European conference on computer vision
Miao X, Zhen X, Liu X, Deng C, Athitsos V, Huang H (2018) Direct shape regression networks for end-to-end face alignment. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 5040–5049
Yue L, Miao X, Wang P, Zhang B, Zhen X, Cao X (2018) Attentional alignment networks. In: British machine vision conference 2018, BMVC 2018, Newcastle, UK, Sept 3–6, 2018, p 208
Qian S, Sun K, Wu W, Qian C, Jia J (2019) Aggregation via separation: boosting facial landmark detector with semisupervised style translation. In: 2019 IEEE international conference on computer vision (ICCV), pp 10152–10162
Kumar A, Chellappa R (2018) Disentangling 3d pose in a dendritic cnn for unconstrained 2d face alignment. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR), pp 430-439
Dong X, Yan Y, Ouyang W, Yang Y (2018) Style aggregated network for facial landmark detection. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 379–388
Tang Z, Peng X, Geng S, Wu L, Metaxas D (2018) Quantized densely connected u-nets for efficient landmark localization. In: 15th European conference, Munich, Germany, 2018, Proceedings, Part III
Dapogny A, Cord M, Bailly K (2019) Decafa: deep convolutional cascade for face alignment in the wild. In: 2019 IEEE/CVF international conference on computer vision (ICCV)
Wang X, Bo L, Fuxin L (2019) Adaptive wing loss for robust face alignment via heatmap regression. In: 2019 IEEE/CVF international conference on computer vision (ICCV)
Kumar A, Marks TK, Mou W (2020) Luvli face alignment: estimating landmarks’ location, uncertainty, and visibility likelihood. In: IEEE conference on computer vision and pattern recognition (CVPR)
Kowalski M, Naruniec J, Trzcinski T (2017) Deep alignment network: a convolutional neural network for robust face alignment. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 2034–2043
Yang J, Liu Q, Zhang K (2017) Stacked hourglass network for robust facial landmark localisation. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 2025–2033
Valle R, Jose M (2018) A deeply-initialized coarse-to-ne ensemble of regression trees for face alignment. In: European conference on computer vision, pp 585–601
Wang J, Sun K, Cheng T, Jiang B, Xiao B (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43:3349–3364
Feng Z, Hu G, Kittler J, Christmas W, Wu X (2015) Cascaded collaborative regression for robust facial landmark detection trained using a mixture of synthetic and real images with dynamic weighting. IEEE Trans Image Process 24(11):3425–3440
Zhang J, Kan M, Shan S, Chen X (2016) Occlusion-free face alignment: deep regression networks coupled with de-corrupt autoencoders. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3428–3437
Wu Y, Gou C, Ji Q (2017) Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5719–5728
Feng ZH, Kittler J, Christmas W, Huber P, Wu XJ (2016) Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3681-3690
Feng ZH, Kittler J, Awais M, Huber P, Wu XJ (2017) Wing loss for robust facial landmark localisation with convolutional neural networks. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR), pp 2235-2245
Wu W, Yang S (2017) Leveraging intra and inter-dataset variations for robust face alignment. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 2096–2105
Acknowledgements
This work is supported by the Natural Science Foundation of Guangdong Province Grant No. 2019A1515111121.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Yang, Z., Shao, X., Wan, J. et al. Mixed attention hourglass network for robust face alignment. Int. J. Mach. Learn. & Cyber. 13, 869–881 (2022). https://doi.org/10.1007/s13042-021-01424-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01424-3