Skip to main content
Log in

Mixed attention hourglass network for robust face alignment

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Unconstrained face alignment is still a challenging problem due to the large poses, partial occlusions and complicated illuminations. To address these issues, in this paper, we propose a mixed attention hourglass network (MAttHG) to learn more discriminative representations by modeling the correlated relationships between features. Specifically, by integrating the attention module from features of different levels in the stacked hourglass networks, MAttHG can capture rich contextual correlations, which can be further used to combine local features to better model the spatial position relationship of facial landmarks. Furthermore, by combining the hourglass network and the attention module, MAttHG can effectively model the global attention and local attention to enhance the facial shape constraints for robust face alignment. Moreover, a head pose prediction module is designed to adaptively adjust the weight of each sample in the training set and redefine the loss function for addressing the problem of data imbalance. Experimental results on challenging benchmark datasets demonstrate the superiority of our MAttHG over state-of-the-art face alignment methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Masi I, Rawls S, Medioni G, Natarajan P (2016) Pose-aware face recognition in the wild. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4838–4846

  2. Liu W, Wen Y, Yu Z, Li M, Raj B, Song L (2017) Sphereface: deep hypersphere embedding for face recognition. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 6738–6746

  3. Wang X, Zeng W, Zheng H, Dan T, Sheng J (2020) A two-step feature extraction algorithm for face recognition. In: ICCDE 2020: 2020 the 6th international conference on computing and data engineering

  4. Kobayashi H, Hara F (2019) The recognition of basic facial expressions by neural network. In: Proceedings 1991 IEEE international joint conference on neural networks

  5. Dou P, Shah SK, Kakadiaris IA (2017) End-to-end 3d face reconstruction with deep neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1503–1512

  6. Sharma S, Kumar V (2020) Voxel-based 3d face reconstruction and its application to face recognition using sequential deeplearning. Multimed Tools Appl 79:17303–17330

    Article  Google Scholar 

  7. Yi S, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: 2013 IEEE conference on computer vision and pattern recognition (CVPR)

  8. Zhou E, Fan H, Cao Z, Jiang Y, Yin Q (2013) Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: 2013 IEEE international conference on computer vision workshops, pp 386–391

  9. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

  10. Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision

  11. Ablavatski A, Lu S, Cai J (2017) Enriched deep recurrent visual attention model for multiple object recognition. In: IEEE winter conference on applications of computer vision

  12. Reynolds GD, Richards JE (2019) Infant visual attention and stimulus repetition effects on object recognition. Child Dev 90(4):1027–1042

    Article  Google Scholar 

  13. Gao P, Lu K, Xue J, Shao L, Lyu J (2020) A coarse-to-fine facial landmark detection method based on self-attentionmechanism. IEEE Trans Multimed 23:926–938

    Article  Google Scholar 

  14. He C, Hu H (2019) Image captioning with visual-semantic double attention. ACM Trans Multimed Comput Commun Appl 15(1):26.1-26.16

    Article  Google Scholar 

  15. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning

  16. Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: 2013 IEEE international conference on computer vision workshops, pp 397–403

  17. Burgos-Artizzu XP, Perona P, Dollár P (2013) Robust face landmark estimation under occlusion. In: 2013 IEEE international conference on computer vision, pp 1513–1520

  18. Wu W, Qian C, Yang S, Wang Q, Cai Y, Zhou Q (2018) Look at boundary: a boundary-aware face alignment algorithm. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 2129–2138

  19. Huang Y, Liu Q, Metaxas DN (2011) A component-based framework for generalized face alignment. IEEE Trans Syst Man Cybern Part B Cybern 41(1):287–298

    Article  Google Scholar 

  20. Hubo C, Feris R, Turk M (2003) Active wavelet networks for face alignment. In: British machine vision conference

  21. Wan J, Li J, Chang J, Wu Y (2018) Face alignment by coarse-to-fine shape estimation. Chin J Electron 27:1183–1191

    Article  Google Scholar 

  22. Cao X, Wei Y, Wen F, Sun J (2014) Face alignment by explicit shape regression. Int J Comput Vis 107:177–190

    Article  MathSciNet  Google Scholar 

  23. Wan J, Li J, Chang J (2018) Face alignment on local-shape-based combined model. Chin J Comput 41:2162–2174

    Google Scholar 

  24. Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: 2013 IEEE conference on computer vision and pattern recognition, pp 532–539

  25. Jie Z, Shan S, Kan M, Chen X (2014) Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In: European conference on computer vision

  26. Cootes T, Edwards G, Taylor C (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685

    Article  Google Scholar 

  27. Cristinacce D, Cootes TF (2006) Feature detection and tracking with constrained local models. In: Proceedings of the British machine vision conference 2006, Edinburgh, UK, Sept 4–7, 2006, pp 95.1–95.10

  28. Ghiasi G, Fowlkes CC (2014) Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In: 2014 IEEE conference on computer vision and pattern recognition, pp 1899–1906

  29. Xing J, Niu Z, Huang J, Hu W, Yan S (2014) Towards multi-view and partially-occluded face alignment. In: 2014 IEEE conference on computer vision and pattern recognition, pp 1829–1836

  30. Lv J, Shao X, Xing J, Cheng C, Zhou X (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3691–3700

  31. Zhu X, Lei Z, Liu X, Shi H, Li SZ (2016) Face alignment across large poses: a 3d solution. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 146–155

  32. Feng Y, Fan W, Shao X, Wang Y, Zhou X (2018) Joint 3D face reconstruction and dense alignment with position map regression network. In: Proceedings, Part XIV, 15th European conference. Munich, Germany, pp 8–14

  33. Wan J, Li J, Lai Z, Du B, Zhang L (2020) Robust face alignment by cascaded regression and de-occlusion. Neural Netw 123:261–272

    Article  Google Scholar 

  34. Guo X, Li S, Zhang J, Ma J, Ling H (2019) Pfld: a practical facial landmark detector. arXiv:1902.10859

  35. Chu X, Yang W, Ouyang W, Ma C, Yuille A, Wang X (2017) Multi-context attention for human pose estimation. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5669–5678

  36. Liu B, Ferrari V (2017) Active learning for human pose estimation. In: 2017 IEEE international conference on computer vision (ICCV), pp 4373–4382

  37. Ren S, Cao X, Wei Y, Sun J (2016) Face alignment via regressing local binary features. IEEE Trans Image Process 25(3):1233–1245

    Article  MathSciNet  Google Scholar 

  38. Shizhan Zhu, Cheng Li, Loy CC, Tang X (2015) Face alignment by coarse-to-fine shape searching. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4998–5006

  39. Zhang Z, Ping L, Chen CL, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision

  40. Trigeorgis G, Snape P, Nicolaou MA, Antonakos E, Zafeiriou S (2016) Mnemonic descent method: a recurrent process applied for end-to-end face alignment. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4177–4187

  41. Xiao S, Feng J, Xing J, Lai H, Yan S, Kassim AA (2016) Robust facial landmark detection via recurrent attentive-refinement networks. In: European conference on computer vision

  42. Miao X, Zhen X, Liu X, Deng C, Athitsos V, Huang H (2018) Direct shape regression networks for end-to-end face alignment. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 5040–5049

  43. Yue L, Miao X, Wang P, Zhang B, Zhen X, Cao X (2018) Attentional alignment networks. In: British machine vision conference 2018, BMVC 2018, Newcastle, UK, Sept 3–6, 2018, p 208

  44. Qian S, Sun K, Wu W, Qian C, Jia J (2019) Aggregation via separation: boosting facial landmark detector with semisupervised style translation. In: 2019 IEEE international conference on computer vision (ICCV), pp 10152–10162

  45. Kumar A, Chellappa R (2018) Disentangling 3d pose in a dendritic cnn for unconstrained 2d face alignment. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR), pp 430-439

  46. Dong X, Yan Y, Ouyang W, Yang Y (2018) Style aggregated network for facial landmark detection. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 379–388

  47. Tang Z, Peng X, Geng S, Wu L, Metaxas D (2018) Quantized densely connected u-nets for efficient landmark localization. In: 15th European conference, Munich, Germany, 2018, Proceedings, Part III

  48. Dapogny A, Cord M, Bailly K (2019) Decafa: deep convolutional cascade for face alignment in the wild. In: 2019 IEEE/CVF international conference on computer vision (ICCV)

  49. Wang X, Bo L, Fuxin L (2019) Adaptive wing loss for robust face alignment via heatmap regression. In: 2019 IEEE/CVF international conference on computer vision (ICCV)

  50. Kumar A, Marks TK, Mou W (2020) Luvli face alignment: estimating landmarks’ location, uncertainty, and visibility likelihood. In: IEEE conference on computer vision and pattern recognition (CVPR)

  51. Kowalski M, Naruniec J, Trzcinski T (2017) Deep alignment network: a convolutional neural network for robust face alignment. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 2034–2043

  52. Yang J, Liu Q, Zhang K (2017) Stacked hourglass network for robust facial landmark localisation. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 2025–2033

  53. Valle R, Jose M (2018) A deeply-initialized coarse-to-ne ensemble of regression trees for face alignment. In: European conference on computer vision, pp 585–601

  54. Wang J, Sun K, Cheng T, Jiang B, Xiao B (2020) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43:3349–3364

    Article  Google Scholar 

  55. Feng Z, Hu G, Kittler J, Christmas W, Wu X (2015) Cascaded collaborative regression for robust facial landmark detection trained using a mixture of synthetic and real images with dynamic weighting. IEEE Trans Image Process 24(11):3425–3440

    Article  MathSciNet  Google Scholar 

  56. Zhang J, Kan M, Shan S, Chen X (2016) Occlusion-free face alignment: deep regression networks coupled with de-corrupt autoencoders. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3428–3437

  57. Wu Y, Gou C, Ji Q (2017) Simultaneous facial landmark detection, pose and deformation estimation under facial occlusion. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5719–5728

  58. Feng ZH, Kittler J, Christmas W, Huber P, Wu XJ (2016) Dynamic attention-controlled cascaded shape regression exploiting training data augmentation and fuzzy-set sample weighting. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 3681-3690

  59. Feng ZH, Kittler J, Awais M, Huber P, Wu XJ (2017) Wing loss for robust facial landmark localisation with convolutional neural networks. In: 2018 IEEE conference on computer vision and pattern recognition (CVPR), pp 2235-2245

  60. Wu W, Yang S (2017) Leveraging intra and inter-dataset variations for robust face alignment. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 2096–2105

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Guangdong Province Grant No. 2019A1515111121.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jun Wan or Rong Gao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 92 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Z., Shao, X., Wan, J. et al. Mixed attention hourglass network for robust face alignment. Int. J. Mach. Learn. & Cyber. 13, 869–881 (2022). https://doi.org/10.1007/s13042-021-01424-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01424-3

Keywords

Navigation