Skip to main content
Log in

Mask guided diverse face image synthesis

Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Recent studies have shown remarkable success in face image generation task. However, existing approaches have limited diversity, quality and controllability in generating results. To address these issues, we propose a novel end-to-end learning framework to generate diverse, realistic and controllable face images guided by face masks. The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face, such as eyes, nose and mouse. The framework consists of four components: style encoder, style decoder, generator and discriminator. The style encoder generates a style code which represents the style of the result face; the generator translate the input face mask into a real face based on the style code; the style decoder learns to reconstruct the style code from the generated face image; and the discriminator classifies an input face image as real or fake. With the style code, the proposed model can generate different face images matching the input face mask, and by manipulating the face mask, we can finely control the generated face image. We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Yan X, Yang J, Sohn K, Lee H. Attribute2image: Conditional image generation from visual attributes. In: Proceedings of the European Conference on Computer Vision. 2015, 776–791

  2. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 5907–5915

  3. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D. Stackgan+−: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41(8): 1947–1962

    Article  Google Scholar 

  4. Choi Y, Choi M, Kim M, Ha J W, Kim S, Choo J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 8789–8797

  5. Choi Y, Uh Y, Yoo J, Ha J W. Stargan v2: Diverse image synthesis for multiple domains. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020, 8188–8197

  6. Isola P, Zhu J Y, Zhou T, Efros A A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 1125–1134

  7. Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J, Catanzaro B. Highresolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 8798–8807

  8. Liu X, Yin G, Shao J, Wang X, Li H. Learning to predict layout-to-image conditional convolutions for semantic image synthesis. In: Proceedings of Advances in Neural Information Processing Systems. 2019, 570–580

  9. Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis. In: Proceedings of the International Conference on Learning Representations. 2019, 1–35

  10. Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 4401–4410

  11. Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 1501–1510

  12. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems. 2014, 2672–2680

  13. Zablotskaia P, Siarohin A, Zhao B, Sigal L. Dwnet: Dense warp-based network for pose-guided human video generation. In: Proceedings of the British Machine Vision Conference. 2019, 205.1-205.13

  14. Zhu J Y, Park T, Isola P, Efros A A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 2223–2232

  15. Zhu J Y, Zhang R, Pathak D, Darrell T, Efros A A, Wang O, Shechtamn E. Toward multimodal image-to-image translation. In: Proceedings of Advances in Neural Information Processing Systems. 2017, 465–476

  16. Zhang G, Kan M, Shan S, Chen X. Generative adversarial network with spatial attention for face attribute editing. In: Proceedings of the European Conference on Computer Vision. 2018, 417–432

  17. Liu Z, Luo P, Wang X, Tang X. Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision. 2015, 3730–3738

  18. Zhao B, Meng L, Yin W, Sigal L. Image generation from layout. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 8584–8593

  19. Zhang M J, Wang N, Li Y, Gao X. Deep latent low-rank representation for face sketch synthesis. IEEE Transactions on Neural Networks and Learning System, 2019, 30(10): 3109–3123

    Article  Google Scholar 

  20. Zhang M J, Wang N, Li Y, Gao X. Neural probabilistic graphical model for face sketch synthesis. IEEE Transactions on Neural Networks and Learning System, 2020, 31(7): 2623–2637

    Article  MathSciNet  Google Scholar 

  21. Zhang M J, Li J, Wang N, Gao X. Compositional model-based sketch generator in facial entertainment. IEEE Transaction on Cybernetics, 2018, 48(3): 904–915

    Article  Google Scholar 

  22. Zhang M J, Wang N, Li Y, Gao X. Bionic face sketch generator. IEEE Transaction on Cybernetics, 2019, 50(6): 2701–2714

    Article  Google Scholar 

  23. Zhang M J, Wang N, Li Y, Gao X, Tao D. Dual-transfer face sketch-photo synthesis. IEEE Transaction on Image Processing, 2019, 28(2): 642–657

    Article  MathSciNet  MATH  Google Scholar 

  24. Zhang M J, Li Y, Wang N, Chi Y, Gao X. Cascaded face sketch synthesis under various illuminations. IEEE Transaction on Image Processing, 2019, 29: 1507–1521

    Article  MathSciNet  MATH  Google Scholar 

  25. He Z, Kan M, Zhang J, Shan S. PA-GAN: progressive attention generative adversarial network for facial attribute editing. arXiv: 2007.05892, 2020

  26. Gu S, Bao J, Yang H, Chen D, Wen F, Yuan L. Mask-guided portrait editing with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 3436–3445

  27. Lee C H, Liu Z, Wu L, Luo P. Maskgan: towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020, 5549–5558

  28. Kim T, Cha M, Kim H, Lee J K, Kim J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of International Conference on Machine Learning. 2017, 1857–1865

  29. Mao Q, Lee H Y, Tseng H Y, Ma S, Yang M H. Mode seeking generative adversarial networks for diverse image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 1429–1437

  30. Yang D, Hong S, Jang Y, Zhao T, Lee H. Diversity-sensitive conditional generative adversarial networks. arXiv: 1901.09024, 2019.

  31. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of Advances in Neural Information Processing Systems. 2017, 6629–6640

  32. Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 586–595

  33. Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training gans. In: Proceedings of the Advances in Neural Information Processing Systems. 2016, 2234–2242

  34. Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems 25. 2012, 1097–1105

    Google Scholar 

  35. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In: Proceedings of Advances in Neural Information Processing Systems Workshop. 2017, 1–4

  36. Mescheder L, Geiger A, and Nowozin S. Which training methods for gans do actually converge? In: Proceedings of International Conference on Machine Learning. 2018, 3481–3490

  37. Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv: 1412.6980, 2014

  38. Karras T, Aila T, Laine S, and Lehtinen J. Progressive growing of gans for improved quality, stability, and variation. arXiv: 1710.10196, 2017

  39. Yazici Y, Foo C, Winkler SS, Yap K H, Piliouras G, Chandrasekhar V. The unusual effectiveness of averaging in gan training. In: Proceedings of the International Conference on Learning Representations, 2019, 1–22

  40. He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026–1034

Download references

Acknowledgements

We would like to thank anonymous reviewers for their valuable feedback. This work is supported by the National Key Research and Development Program of China (2018YFF0214700).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junhao Wen.

Additional information

Song Sun received the BS and MS degrees in software engineering from Chongqing University, China in 2011 and 2014. He is currently pursuing the PhD degree with the School of Big Data & Software Engineering, Chongqing University, China. His research interests include recommendation system, computer vision and machine learning.

Bo Zhao received his PhD degree from School of Information Science and Technology, Southwest Jiaotong University, China. Currently, he is at the Department of Computer Science, the University of British Columbia, Canada as a Postdoctoral Research Fellow. He received the BSc degree in Networking Engineering from Southwest Jiaotong University, China in 2010. His research interests include multimedia, computer vision and machine learning.

Muhammad Mateen received master’s degree in computer science from Air University, Pakistan, in 2015 and PhD in Software Engineering from Chongqing University, China in 2020. Currently, he is working as an Assistant Professor at Air University Multan Campus, Pakistan. His research interests include software engineering, image processing, and deep learning. He is a member of China Computer Federation (CCF).

Xin Chen received his PhD degree in College of Computer Science of Chongqing University China in 2017, MS degree in School of Computer Science and Engineering of Beihang University China in 2007, and BS degree in School of Software Engineering of Chongqing University China in 2004. He is currently a researcher in School of Big Data and Software Engineering, Chongqing University, China. His research interests focus on dynamical systems, big data, consensus of multi-agent systems and neural networks.

Junhao Wen received the PhD degree from Chongqing University, China in 2008, where he is a professor with the School of Big Data & Software Engineering. His research interests include service computing, cloud computing, and software dependable engineering. He has published over 80 refereed journal and conference papers in the above areas. He has over 30 research and industrial grants and developed many commercial systems and software tools.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, S., Zhao, B., Mateen, M. et al. Mask guided diverse face image synthesis. Front. Comput. Sci. 16, 163311 (2022). https://doi.org/10.1007/s11704-020-0400-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-020-0400-7

Keywords

Navigation