Skip to main content
Log in

Joint face alignment and segmentation via deep multi-task learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Face alignment and segmentation are challenging problems which have been extensively studied in the field of multimedia. These two tasks are closely related and their learning processes are supposed to benefit each other. Hence, we present a joint multi-task learning algorithm for both face alignment and segmentation using deep convolutional neural network (CNN). The proposed multi-task learning approach allows CNN model to simultaneously share visual knowledge between different tasks. With a carefully designed refinement residual module, the cross-layer features are fused in a collaborative manner. To the best of our knowledge, this is the first time that face alignment and segmentation are learned together via deep multi-task learning. Our experiments show that learning these two related tasks simultaneously builds a synergy between them, improves the performance of each individual task, and rivals recent approaches. Furthermore, we demonstrate the effectiveness of our model in two practical applications: virtual makeup and face swap.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://www.taaz.com/

References

  1. Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561

  2. Bao BK, Liu G, Xu C, Yan S (2012) Inductive robust principal component analysis. IEEE Trans Image Process 21(8):3794–3800

    Article  MathSciNet  MATH  Google Scholar 

  3. Bao BK, Zhu G, Shen J, Yan S (2013) Robust image analysis with sparse representation on quantized visual features. IEEE Trans Image Process 22(3):860–871

    Article  MathSciNet  MATH  Google Scholar 

  4. Belhumeur PN, Jacobs DW, Kriegman DJ, Kumar N (2013) Localizing parts of faces using a consensus of exemplars. IEEE Trans Pattern Anal Mach Intell 35(12):2930–2940

    Article  Google Scholar 

  5. Bookstein FL (1989) Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Mach Intell 11(6):567–585

    Article  MATH  Google Scholar 

  6. Cao X, Wei Y, Wen F, Sun J (2014) Face alignment by explicit shape regression. Int J Comput Vis 107(2):177–190

    Article  MathSciNet  Google Scholar 

  7. Caruana R (1998) Multitask learning. In: Learning to learn. Springer, pp 95–133

  8. Eigen D, Fergus R (2015) Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE international conference on computer vision, pp 2650–2658

  9. Elad M, Milanfar P (2017) Style transfer via texture synthesis. IEEE Trans Image Process 26(5):2338–2351

  10. Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  11. Gkioxari G, Hariharan B, Girshick R, Malik J (2014) R-CNNs for pose estimation and action detection. arXiv preprint. arXiv:1406.5212

  12. Gross R, Matthews I, Cohn J, Kanade T, Baker S (2010) Multi-pie. Image Vis Comput 28(5):807–813

    Article  Google Scholar 

  13. Happy S, Routray A (2015) Automatic facial expression recognition using features of salient facial patches. IEEE Trans Affect Comput 6(1):1–12

    Article  Google Scholar 

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  15. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, pp 448–456

  16. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678

  17. Korshunova I, Shi W, Dambre J, Theis L (2016) Fast face-swap using convolutional neural networks. arXiv:1611.09577

  18. Köstinger M, Wohlhart P, Roth PM, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: IEEE international conference on computer vision workshops (ICCV workshops), pp 2144–2151. https://doi.org/10.1109/ICCVW.2011.6130513

  19. Liang L, Xiao R, Wen F, Sun J (2008) Face alignment via component-based discriminative search. In: European conference on computer vision. Springer International Publishing, pp 72–85

  20. Liu S, Ou X, Qian R, Wang W, Cao X (2016) Makeup like a superstar: deep localized makeup transfer network. In: 25th international joint conference on artificial intelligence, IJCAI2016

  21. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3431–3440

  22. Luo P, Wang X, Tang X (2012) Hierarchical face parsing via deep learning. In: IEEE conference on computer vision and pattern recognition, pp 2480–2487

  23. Masi I, Trần AT, Hassner T, Leksut JT, Medioni G (2016) Do we really need to collect millions of faces for effective face recognition? In: European conference on computer vision. Springer, pp 579–596

  24. Matthews I, Baker S (2004) Active appearance models revisited. Int J Comput Vis 60(2):135–164

    Article  Google Scholar 

  25. Mosaddegh S, Simon L, Jurie F (2014) Photorealistic face de-identification by aggregating donors’ face components. In: Asian conference on computer vision. Springer, pp 159–174

  26. Oikawa MA, Dias Z, de Rezende Rocha A, Goldenstein S (2016) Manifold learning and spectral clustering for image phylogeny forests. IEEE Trans Inf Forensics Secur 11(1):5–18

    Article  Google Scholar 

  27. Pinheiro PO, Lin TY, Collobert R, Dollár P (2016) Learning to refine object segments. In: European conference on computer vision. Springer, pp 75–91

  28. Ranjan R, Patel VM, Chellappa R (2016) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv:1603.01249

  29. Reinhard E, Adhikhmin M, Gooch B, Shirley P (2001) Color transfer between images. IEEE Comput Graph Appl 21(5):34–41

    Article  Google Scholar 

  30. Saito S, Li T, Li H (2016) Real-time facial segmentation and performance capture from rgb input. In: European conference on computer vision. Springer International Publishing, pp 244–261

  31. Shao Z, Ding S, Zhao Y, Zhang Q, Ma L (2016) Learning deep representation from coarse to fine for face alignment. In: IEEE international conference on multimedia and expo

  32. Sheng K, Dong W, Kong Y, Mei X, Li J, Wang C, Huang F, Hu BG (2015) Evaluating the quality of face alignment without ground truth. Comput Graphics Forum 34(7):213–223

    Article  Google Scholar 

  33. Smith BM, Zhang L, Brandt J, Lin Z, Yang J (2013) Exemplar-based face parsing. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 3484–3491

  34. Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: IEEE conference on cmputer vision and pattern recognition (CVPR), pp 3476–3483

  35. Van de Sande KE, Uijlings JR, Gevers T, Smeulders AW (2011) Segmentation as selective search for object recognition. In: IEEE international conference on computer vision (ICCV). IEEE, pp 1879–1886

  36. Xiong X, De la Torre F (2013) Supervised descent method and its applications to face alignment. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 532–539

  37. Yang Y, Hospedales TM (2014) A unified perspective on multi-domain and multi-task learning. arXiv:1412.7489

  38. Zhang J, Shan S, Kan M, Chen X (2014) Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: European conference on computer vision. Springer International Publishing, pp 1–16

  39. Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503

    Article  Google Scholar 

  40. Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision. Springer, pp 94–108

  41. Zhou J, Chen J, Ye J (2011) Clustered multi-task learning via alternating structure optimization. In: Advances in neural information processing systems, pp 702–710

  42. Zhu X, Ramanan D (2012) Face detection, pose estimation, and landmark localization in the wild. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, Washington, DC, pp 2879–2886

Download references

Acknowledgements

The Titan X used for this research was donated by the NVIDIA Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiming Dong.

Additional information

This work was supported by National Natural Science Foundation of China under nos. 61672520, 61702488, 61501464 and 6120106003, by Beijing Natural Science Foundation under No. 4162056, by National Key Technology R&D Program of China under No. 2015BAH53F02, and by CASIA-Tencent YouTu jointly research project.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Y., Tang, F., Dong, W. et al. Joint face alignment and segmentation via deep multi-task learning. Multimed Tools Appl 78, 13131–13148 (2019). https://doi.org/10.1007/s11042-018-5609-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5609-1

Keywords

Navigation