Mask guided diverse face image synthesis

Sun, Song; Zhao, Bo; Mateen, Muhammad; Chen, Xin; Wen, Junhao

doi:10.1007/s11704-020-0400-7

Mask guided diverse face image synthesis

Research Article
Published: 11 November 2021

Volume 16, article number 163311, (2022)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Song Sun¹,
Bo Zhao²,
Muhammad Mateen³,
Xin Chen¹ &
…
Junhao Wen¹

117 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Recent studies have shown remarkable success in face image generation task. However, existing approaches have limited diversity, quality and controllability in generating results. To address these issues, we propose a novel end-to-end learning framework to generate diverse, realistic and controllable face images guided by face masks. The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face, such as eyes, nose and mouse. The framework consists of four components: style encoder, style decoder, generator and discriminator. The style encoder generates a style code which represents the style of the result face; the generator translate the input face mask into a real face based on the style code; the style decoder learns to reconstruct the style code from the generated face image; and the discriminator classifies an input face image as real or fake. With the style code, the proposed model can generate different face images matching the input face mask, and by manipulating the face mask, we can finely control the generated face image. We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MFI3D: masked face identification with 3D face reconstruction and deep learning

Article 06 December 2024

Iterative facial image inpainting based on an encoder-generator architecture

Article 23 February 2022

A fast mask synthesis method for face recognition

Article Open access 14 August 2024

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Yan X, Yang J, Sohn K, Lee H. Attribute2image: Conditional image generation from visual attributes. In: Proceedings of the European Conference on Computer Vision. 2015, 776–791
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 5907–5915
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D. Stackgan+−: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41(8): 1947–1962
Article Google Scholar
Choi Y, Choi M, Kim M, Ha J W, Kim S, Choo J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 8789–8797
Choi Y, Uh Y, Yoo J, Ha J W. Stargan v2: Diverse image synthesis for multiple domains. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020, 8188–8197
Isola P, Zhu J Y, Zhou T, Efros A A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 1125–1134
Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J, Catanzaro B. Highresolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 8798–8807
Liu X, Yin G, Shao J, Wang X, Li H. Learning to predict layout-to-image conditional convolutions for semantic image synthesis. In: Proceedings of Advances in Neural Information Processing Systems. 2019, 570–580
Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis. In: Proceedings of the International Conference on Learning Representations. 2019, 1–35
Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 4401–4410
Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 1501–1510
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems. 2014, 2672–2680
Zablotskaia P, Siarohin A, Zhao B, Sigal L. Dwnet: Dense warp-based network for pose-guided human video generation. In: Proceedings of the British Machine Vision Conference. 2019, 205.1-205.13
Zhu J Y, Park T, Isola P, Efros A A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 2223–2232
Zhu J Y, Zhang R, Pathak D, Darrell T, Efros A A, Wang O, Shechtamn E. Toward multimodal image-to-image translation. In: Proceedings of Advances in Neural Information Processing Systems. 2017, 465–476
Zhang G, Kan M, Shan S, Chen X. Generative adversarial network with spatial attention for face attribute editing. In: Proceedings of the European Conference on Computer Vision. 2018, 417–432
Liu Z, Luo P, Wang X, Tang X. Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision. 2015, 3730–3738
Zhao B, Meng L, Yin W, Sigal L. Image generation from layout. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 8584–8593
Zhang M J, Wang N, Li Y, Gao X. Deep latent low-rank representation for face sketch synthesis. IEEE Transactions on Neural Networks and Learning System, 2019, 30(10): 3109–3123
Article Google Scholar
Zhang M J, Wang N, Li Y, Gao X. Neural probabilistic graphical model for face sketch synthesis. IEEE Transactions on Neural Networks and Learning System, 2020, 31(7): 2623–2637
Article MathSciNet Google Scholar
Zhang M J, Li J, Wang N, Gao X. Compositional model-based sketch generator in facial entertainment. IEEE Transaction on Cybernetics, 2018, 48(3): 904–915
Article Google Scholar
Zhang M J, Wang N, Li Y, Gao X. Bionic face sketch generator. IEEE Transaction on Cybernetics, 2019, 50(6): 2701–2714
Article Google Scholar
Zhang M J, Wang N, Li Y, Gao X, Tao D. Dual-transfer face sketch-photo synthesis. IEEE Transaction on Image Processing, 2019, 28(2): 642–657
Article MathSciNet MATH Google Scholar
Zhang M J, Li Y, Wang N, Chi Y, Gao X. Cascaded face sketch synthesis under various illuminations. IEEE Transaction on Image Processing, 2019, 29: 1507–1521
Article MathSciNet MATH Google Scholar
He Z, Kan M, Zhang J, Shan S. PA-GAN: progressive attention generative adversarial network for facial attribute editing. arXiv: 2007.05892, 2020
Gu S, Bao J, Yang H, Chen D, Wen F, Yuan L. Mask-guided portrait editing with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 3436–3445
Lee C H, Liu Z, Wu L, Luo P. Maskgan: towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020, 5549–5558
Kim T, Cha M, Kim H, Lee J K, Kim J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of International Conference on Machine Learning. 2017, 1857–1865
Mao Q, Lee H Y, Tseng H Y, Ma S, Yang M H. Mode seeking generative adversarial networks for diverse image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 1429–1437
Yang D, Hong S, Jang Y, Zhao T, Lee H. Diversity-sensitive conditional generative adversarial networks. arXiv: 1901.09024, 2019.
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of Advances in Neural Information Processing Systems. 2017, 6629–6640
Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 586–595
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training gans. In: Proceedings of the Advances in Neural Information Processing Systems. 2016, 2234–2242
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems 25. 2012, 1097–1105
Google Scholar
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In: Proceedings of Advances in Neural Information Processing Systems Workshop. 2017, 1–4
Mescheder L, Geiger A, and Nowozin S. Which training methods for gans do actually converge? In: Proceedings of International Conference on Machine Learning. 2018, 3481–3490
Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv: 1412.6980, 2014
Karras T, Aila T, Laine S, and Lehtinen J. Progressive growing of gans for improved quality, stability, and variation. arXiv: 1710.10196, 2017
Yazici Y, Foo C, Winkler SS, Yap K H, Piliouras G, Chandrasekhar V. The unusual effectiveness of averaging in gan training. In: Proceedings of the International Conference on Learning Representations, 2019, 1–22
He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026–1034

Download references

Acknowledgements

We would like to thank anonymous reviewers for their valuable feedback. This work is supported by the National Key Research and Development Program of China (2018YFF0214700).

Author information

Authors and Affiliations

School of Big Data & Software Engineering, Chongqing University, Chongqing, 401331, China
Song Sun, Xin Chen & Junhao Wen
Department of Computer Science, The University of British Columbia, Vancouve, BC, V6T 1Z4, Canada
Bo Zhao
Department of Computer Science, Air University Multan Campus, Multan, 60000, Pakistan
Muhammad Mateen

Authors

Song Sun
View author publications
Search author on:PubMed Google Scholar
Bo Zhao
View author publications
Search author on:PubMed Google Scholar
Muhammad Mateen
View author publications
Search author on:PubMed Google Scholar
Xin Chen
View author publications
Search author on:PubMed Google Scholar
Junhao Wen
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Junhao Wen.

Additional information

Song Sun received the BS and MS degrees in software engineering from Chongqing University, China in 2011 and 2014. He is currently pursuing the PhD degree with the School of Big Data & Software Engineering, Chongqing University, China. His research interests include recommendation system, computer vision and machine learning.

Bo Zhao received his PhD degree from School of Information Science and Technology, Southwest Jiaotong University, China. Currently, he is at the Department of Computer Science, the University of British Columbia, Canada as a Postdoctoral Research Fellow. He received the BSc degree in Networking Engineering from Southwest Jiaotong University, China in 2010. His research interests include multimedia, computer vision and machine learning.

Muhammad Mateen received master’s degree in computer science from Air University, Pakistan, in 2015 and PhD in Software Engineering from Chongqing University, China in 2020. Currently, he is working as an Assistant Professor at Air University Multan Campus, Pakistan. His research interests include software engineering, image processing, and deep learning. He is a member of China Computer Federation (CCF).

Xin Chen received his PhD degree in College of Computer Science of Chongqing University China in 2017, MS degree in School of Computer Science and Engineering of Beihang University China in 2007, and BS degree in School of Software Engineering of Chongqing University China in 2004. He is currently a researcher in School of Big Data and Software Engineering, Chongqing University, China. His research interests focus on dynamical systems, big data, consensus of multi-agent systems and neural networks.

Junhao Wen received the PhD degree from Chongqing University, China in 2008, where he is a professor with the School of Big Data & Software Engineering. His research interests include service computing, cloud computing, and software dependable engineering. He has published over 80 refereed journal and conference papers in the above areas. He has over 30 research and industrial grants and developed many commercial systems and software tools.

Electronic supplementary material