Abstract
High accuracy and fast face alignment algorithms play an important role in many face-related applications. Generally, the model speed is inversely related to the number of parameters. We construct our network based on densely connected encoder-decoders, which is an efficient method to balance the parameter number and localization results. In each encoder-decoder, we introduce stacking depthwise convolution and depthwise feature fusion within the same channel, which greatly improves the performance of depthwise convolution and reduces the number of model parameters. In addition, we enhance the mean square loss function by assigning different penalty weights to each coordinate according to the distance to the position corresponding to the maximum value in the label heatmap. Experiments show that the model with the improved loss function obtains better localization results. In the experiment, we compare our method to state-of-the-art methods based on 300W and WFLW. The localization error is 2.76% with the common subset of 300W and the model size (0.7M) is small and even utilizes approximately 1% of the number of parameters of the other models. The dataset and model based on WFLW are publicly available at https://github.com/iam-zhanghongliang/DC-EDN.










Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alexanderson S, Beskow J (2014) Animated lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions. Comput Speech Lang 28(2):607–618
Alp Guler R, Trigeorgis G, Antonakos E, Snape P, Zafeiriou S, Kokkinos I (2017) Densereg: Fully convolutional dense shape regression in-the-wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 6799–6808
Belhumeur P N, Jacobs D W, Kriegman D J, Kumar N (2013) Localizing parts of faces using a consensus of exemplars. IEEE Trans Pattern Anal Mach Intell 35(12):2930–2940
Cao X, Wei Y, Wen F, Sun J (2014) Face alignment by explicit shape regression. Int J Comput Vis 107(2):177–190
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Dapogny A, Bailly K, Cord M (2019) Decafa: Deep convolutional cascade for face alignment in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp 6893–6901
Deng J, Trigeorgis G, Zhou Y, Zafeiriou S (2019) Joint multi-view face alignment in the wild. IEEE Trans Image Process 28(7):3636–3648
Dong X, Yan Y, Ouyang W, Yang Y (2018) Style aggregated network for facial landmark detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 379–388
Feng Y, Wu F, Shao X, Wang Y, Zhou X (2018) Joint 3d face reconstruction and dense alignment with position map regression network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 534–551
Howard A G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint
Jégou S, Drozdzal M, Vazquez D, Romero A, Bengio Y (2017) The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 11–19
Kazemi V, Sullivan J (2014) One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1867–1874
Kemelmacher-Shlizerman I, Seitz SM, Miller D, Brossard E (2016) The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4873–4882
Kowalski M, Naruniec J, Trzcinski T (2017) Deep alignment network: A convolutional neural network for robust face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 88–97
Kumar A, Chellappa R (2018) Disentangling 3d pose in a dendritic cnn for unconstrained 2d face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 430–439
Le V, Brandt J, Lin Z, Bourdev L, Huang TS (2012) Interactive facial feature localization. In: European conference on computer vision, Springer, pp 679–692
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690
Liu Q, Deng J, Yang J, Liu G, Tao D (2017) Adaptive cascade regression model for robust face alignment. IEEE Trans Image Process 26(2):797–807
Park H, Kim D (2020) A complementary regression network for accurate face alignment. Image Vis Comput 95:103,883
Ren S, Cao X, Wei Y, Jian S (2014) Face alignment at 3000 fps via regressing local binary features. In: Computer Vision and Pattern Recognition, pp 1685–1692
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, Springer, pp 234–241
Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp 397–403
Sariyanidi E, Gunes H, Cavallaro A (2015) Automatic analysis of facial affect: a survey of registration, representation, and recognition. IEEE Trans Pattern Anal Mach Intell 37(6):1113–1133
Shi B, Bai X, Liu W, Wang J (2018) Face alignment with deep regression. IEEE Trans Neural Netw Learn Syst 29(1):183– 194
Sun Y, Wang X, Tang X (2013) Hybrid deep learning for face verification. In: IEEE International Conference on Computer Vision, pp 1997–2009
Sun Y, Chen Y, Wang X, Tang X (2014) Deep learning face representation by joint identification-verification. In: Advances in neural information processing systems, pp 1988–1996
Tang Z, Peng X, Geng S, Wu L, Zhang S, Metaxas D (2018) Quantized densely connected u-nets for efficient landmark localization. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 339–354
Trigeorgis G, Snape P, Nicolaou MA, Antonakos E, Zafeiriou S (2016) Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4177–4187
Valle R, Buenaposada JM, Valdes A, Baumela L (2018) A deeply-initialized coarse-to-fine ensemble of regression trees for face alignment. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 585–601
Wan J, Li J, Lai Z, Du B, Zhang L (2020) Robust face alignment by cascaded regression and de-occlusion. Neural Netw:261–272
Wang T, Tong X, Cai W (2020) Attention-based face alignment: a solution to speed/accuracy trade-off. Neurocomputing:–96
Wu W, Yang S (2017) Leveraging intra and inter-dataset variations for robust face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 150–159
Wu W, Qian C, Yang S, Wang Q, Cai Y, Zhou Q (2018) Look at boundary: A boundary-aware face alignment algorithm. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2138
Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 532–539
Yang H, Patras I (2013) Sieving regression forest votes for facial feature detection in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1936–1943
Yang J, Liu Q, Zhang K (2017) Stacked hourglass network for robust facial landmark localisation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 79–87
Zhang Z, Luo P, Loy C C, Tang X (2016) Learning deep representation for face alignment with auxiliary attributes. IEEE Trans Pattern Anal Mach Intell 38(5):918–930
Zhao Y, Liu Y, Shen C, Gao Y, Xiong S (2020) Mobilefan: Transferring deep hidden representation for face alignment. Pattern Recogn 100:107–114
Zhu S, Li C, Change Loy C, Tang X (2015) Face alignment by coarse-to-fine shape searching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4998–5006
Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2879–2886
Zhu Z, Ping L, Wang X, Tang X (2013) Deep learning identity-preserving face space. In: IEEE International Conference on Computer Vision, pp 113–120
Acknowledgments
This work is supported by the Fundamental Research Funds for the Central Universities (Grant No. N160504007) and the National Natural Science Foundation of China (Grant No.31301086)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, L., Zhang, H., Wei, P. et al. DC-EDN: densely connected encoder-decoder network with reinforced depthwise convolution for face alignment. Appl Intell 51, 5025–5039 (2021). https://doi.org/10.1007/s10489-020-01940-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01940-9