Abstract
Recent studies have shown remarkable success in face image generation task. However, existing approaches have limited diversity, quality and controllability in generating results. To address these issues, we propose a novel end-to-end learning framework to generate diverse, realistic and controllable face images guided by face masks. The face mask provides a good geometric constraint for a face by specifying the size and location of different components of the face, such as eyes, nose and mouse. The framework consists of four components: style encoder, style decoder, generator and discriminator. The style encoder generates a style code which represents the style of the result face; the generator translate the input face mask into a real face based on the style code; the style decoder learns to reconstruct the style code from the generated face image; and the discriminator classifies an input face image as real or fake. With the style code, the proposed model can generate different face images matching the input face mask, and by manipulating the face mask, we can finely control the generated face image. We empirically demonstrate the effectiveness of our approach on mask guided face image synthesis task.
Similar content being viewed by others
References
Yan X, Yang J, Sohn K, Lee H. Attribute2image: Conditional image generation from visual attributes. In: Proceedings of the European Conference on Computer Vision. 2015, 776–791
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 5907–5915
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D. Stackgan+−: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41(8): 1947–1962
Choi Y, Choi M, Kim M, Ha J W, Kim S, Choo J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 8789–8797
Choi Y, Uh Y, Yoo J, Ha J W. Stargan v2: Diverse image synthesis for multiple domains. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020, 8188–8197
Isola P, Zhu J Y, Zhou T, Efros A A. Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, 1125–1134
Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J, Catanzaro B. Highresolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 8798–8807
Liu X, Yin G, Shao J, Wang X, Li H. Learning to predict layout-to-image conditional convolutions for semantic image synthesis. In: Proceedings of Advances in Neural Information Processing Systems. 2019, 570–580
Brock A, Donahue J, Simonyan K. Large scale GAN training for high fidelity natural image synthesis. In: Proceedings of the International Conference on Learning Representations. 2019, 1–35
Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 4401–4410
Huang X, Belongie S. Arbitrary style transfer in real-time with adaptive instance normalization. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 1501–1510
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems. 2014, 2672–2680
Zablotskaia P, Siarohin A, Zhao B, Sigal L. Dwnet: Dense warp-based network for pose-guided human video generation. In: Proceedings of the British Machine Vision Conference. 2019, 205.1-205.13
Zhu J Y, Park T, Isola P, Efros A A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision. 2017, 2223–2232
Zhu J Y, Zhang R, Pathak D, Darrell T, Efros A A, Wang O, Shechtamn E. Toward multimodal image-to-image translation. In: Proceedings of Advances in Neural Information Processing Systems. 2017, 465–476
Zhang G, Kan M, Shan S, Chen X. Generative adversarial network with spatial attention for face attribute editing. In: Proceedings of the European Conference on Computer Vision. 2018, 417–432
Liu Z, Luo P, Wang X, Tang X. Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision. 2015, 3730–3738
Zhao B, Meng L, Yin W, Sigal L. Image generation from layout. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 8584–8593
Zhang M J, Wang N, Li Y, Gao X. Deep latent low-rank representation for face sketch synthesis. IEEE Transactions on Neural Networks and Learning System, 2019, 30(10): 3109–3123
Zhang M J, Wang N, Li Y, Gao X. Neural probabilistic graphical model for face sketch synthesis. IEEE Transactions on Neural Networks and Learning System, 2020, 31(7): 2623–2637
Zhang M J, Li J, Wang N, Gao X. Compositional model-based sketch generator in facial entertainment. IEEE Transaction on Cybernetics, 2018, 48(3): 904–915
Zhang M J, Wang N, Li Y, Gao X. Bionic face sketch generator. IEEE Transaction on Cybernetics, 2019, 50(6): 2701–2714
Zhang M J, Wang N, Li Y, Gao X, Tao D. Dual-transfer face sketch-photo synthesis. IEEE Transaction on Image Processing, 2019, 28(2): 642–657
Zhang M J, Li Y, Wang N, Chi Y, Gao X. Cascaded face sketch synthesis under various illuminations. IEEE Transaction on Image Processing, 2019, 29: 1507–1521
He Z, Kan M, Zhang J, Shan S. PA-GAN: progressive attention generative adversarial network for facial attribute editing. arXiv: 2007.05892, 2020
Gu S, Bao J, Yang H, Chen D, Wen F, Yuan L. Mask-guided portrait editing with conditional gans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 3436–3445
Lee C H, Liu Z, Wu L, Luo P. Maskgan: towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020, 5549–5558
Kim T, Cha M, Kim H, Lee J K, Kim J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of International Conference on Machine Learning. 2017, 1857–1865
Mao Q, Lee H Y, Tseng H Y, Ma S, Yang M H. Mode seeking generative adversarial networks for diverse image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019, 1429–1437
Yang D, Hong S, Jang Y, Zhao T, Lee H. Diversity-sensitive conditional generative adversarial networks. arXiv: 1901.09024, 2019.
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Proceedings of Advances in Neural Information Processing Systems. 2017, 6629–6640
Zhang R, Isola P, Efros A A, Shechtman E, Wang O. The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, 586–595
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training gans. In: Proceedings of the Advances in Neural Information Processing Systems. 2016, 2234–2242
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems 25. 2012, 1097–1105
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in pytorch. In: Proceedings of Advances in Neural Information Processing Systems Workshop. 2017, 1–4
Mescheder L, Geiger A, and Nowozin S. Which training methods for gans do actually converge? In: Proceedings of International Conference on Machine Learning. 2018, 3481–3490
Kingma D P, Ba J. Adam: A method for stochastic optimization. arXiv: 1412.6980, 2014
Karras T, Aila T, Laine S, and Lehtinen J. Progressive growing of gans for improved quality, stability, and variation. arXiv: 1710.10196, 2017
Yazici Y, Foo C, Winkler SS, Yap K H, Piliouras G, Chandrasekhar V. The unusual effectiveness of averaging in gan training. In: Proceedings of the International Conference on Learning Representations, 2019, 1–22
He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 1026–1034
Acknowledgements
We would like to thank anonymous reviewers for their valuable feedback. This work is supported by the National Key Research and Development Program of China (2018YFF0214700).
Author information
Authors and Affiliations
Corresponding author
Additional information
Song Sun received the BS and MS degrees in software engineering from Chongqing University, China in 2011 and 2014. He is currently pursuing the PhD degree with the School of Big Data & Software Engineering, Chongqing University, China. His research interests include recommendation system, computer vision and machine learning.
Bo Zhao received his PhD degree from School of Information Science and Technology, Southwest Jiaotong University, China. Currently, he is at the Department of Computer Science, the University of British Columbia, Canada as a Postdoctoral Research Fellow. He received the BSc degree in Networking Engineering from Southwest Jiaotong University, China in 2010. His research interests include multimedia, computer vision and machine learning.
Muhammad Mateen received master’s degree in computer science from Air University, Pakistan, in 2015 and PhD in Software Engineering from Chongqing University, China in 2020. Currently, he is working as an Assistant Professor at Air University Multan Campus, Pakistan. His research interests include software engineering, image processing, and deep learning. He is a member of China Computer Federation (CCF).
Xin Chen received his PhD degree in College of Computer Science of Chongqing University China in 2017, MS degree in School of Computer Science and Engineering of Beihang University China in 2007, and BS degree in School of Software Engineering of Chongqing University China in 2004. He is currently a researcher in School of Big Data and Software Engineering, Chongqing University, China. His research interests focus on dynamical systems, big data, consensus of multi-agent systems and neural networks.
Junhao Wen received the PhD degree from Chongqing University, China in 2008, where he is a professor with the School of Big Data & Software Engineering. His research interests include service computing, cloud computing, and software dependable engineering. He has published over 80 refereed journal and conference papers in the above areas. He has over 30 research and industrial grants and developed many commercial systems and software tools.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Sun, S., Zhao, B., Mateen, M. et al. Mask guided diverse face image synthesis. Front. Comput. Sci. 16, 163311 (2022). https://doi.org/10.1007/s11704-020-0400-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11704-020-0400-7