Skip to main content
Log in

A face template: Improving the face generation quality of multi-stage generative adversarial networks using coarse-grained facial priors

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The current text-to-face synthesis models only utilize text descriptions for image synthesis, neglecting the prior information of basic facial features. This leads to insufficient learning of both coarse-grained facial features, such as face shape and positions of the basic organs, and fine-grained facial features, such as facial wrinkles and hair textures, by the generators in each stage. As a result, the quality of the generated faces is low. To address this issue, we propose a generic face template. It includes only common facial information, such as face contour, shapes and relative positions of the basic organs. Moreover, to embed the face template into the first-stage generator of three stages for assisting face generation, we design a Facial Coarse-grained Feature Excitation Module (FCFEM). FCFEM extracts the coarse-grained feature channel weights of the face template. And it performs channel recalibration on the intermediate feature maps of the first-stage generator. This helps to generate more precise and complete initial face images. Therefore, it can enhance the ability of the generators in the latter two stages to learn fine-grained features. Experiments on the Multi-Modal CelebA-HQ dataset demonstrate that the multi-stage models using our method generate face images with higher quality and realism compared to the original models. They also achieve higher semantic consistency between generated images and text descriptions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Algorithm 1
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availability

The datasets generated during and/or analyzed during the current study are available in the repository [30], https://github.com/IIGROUP/MM-CelebA-HQ-Dataset.

References

  1. Chen X, Qing L, He X, Luo X, Xu Y (2019) Ftgan: A fully-trained generative adversarial networks for text to face generation. arXiv preprint arXiv:1904.05729

  2. Cheng J, Wu F, Tian Y, Wang L, & Tao D (2020) Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10911–10920

  3. Di X, Patel VM (2019) Facial synthesis from visual attributes via sketch using multiscale generators. IEEE Transactions on Biometrics, Behavior, and Identity Science 2(1):55–67

    Article  Google Scholar 

  4. Gatt A, Tanti M, Muscat A, Paggio P, Farrugia RA, Borg C, Camilleri KP, Rosner M, Van der Plas L (2018) Face2text: Collecting an annotated image description corpus for the generation of rich face descriptions. arXiv preprint arXiv:1803.03827

  5. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in Neural Information Processing Systems 27

  6. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems 30

  7. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7132–7141

  8. Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196

  9. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4401–4410

  10. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In European Conference on Computer Vision, pages 740–755. Springer

  11. Li B, Qi X, Lukasiewicz T, Torr P (2019) Controllable text-to-image generation. Advances in Neural Information Processing Systems 32

  12. Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pages 3730–3738

  13. Luo X, He X, Chen X, Qing L, Zhang J (2022) Dualg-gan, a dual-channel generator based generative adversarial network for text-to-face synthesis. Neural Netw 155:155–167

    Article  PubMed  Google Scholar 

  14. Lu Y, Tai YW, Tang CK (2018) Attribute-guided face generation using conditional cyclegan. In Proceedings of the European Conference on Computer Vision (ECCV), pages 282–297

  15. Nasir OR, Jha SK, Grover MS, Yu Y, Kumar A, Shah RR (2019) Text2facegan: Face generation from fine grained textual descriptions. In 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), pages 58–67. IEEE

  16. Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In International Conference on Machine Learning, pages 2642–2651. PMLR

  17. Park H, Yoo Y, Kwak N (2018) Mc-gan: Multi-conditional generative adversarial network for image synthesis. arXiv preprint arXiv:1805.01123, 2018

  18. Qiao X, Han Y, Wu Y, Zhang Z (2021) Progressive text-to-face synthesis with generative adversarial network. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pages 1–8. IEEE

  19. Qiao T, Zhang J, Xu D, Tao D (2019) Mirrorgan: Learning text-to-image generation by redescription. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1505–1514

  20. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434

  21. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In International Conference on Machine Learning, pages 1060–1069. PMLR

  22. Ruan S, Zhang Y, Zhang K, Fan Y, Tang F, Liu Q, Chen E (2021) Dae-gan: Dynamic aspect-aware gan for text-to-image synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13960–13969

  23. Stap D, Bleeker M, Ibrahimi S, ter Hoeve M (2020) Conditional image generation and manipulation for user-specified content. arXiv preprint arXiv:2005.04909

  24. Sun J, Deng Q, Li Q, Sun M, Ren M, Sun Z (2022) Anyface: Free-style text-to-face synthesis and manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18687–18696

  25. Sun J, Li Q, Wang W, Zhao J, Sun Z (2021) Multi-caption text-to-face synthesis: Dataset and algorithm. In Proceedings of the 29th ACM International Conference on Multimedia, pages 2290–2298

  26. Tao M, Tang H, Wu F, Jing XY, Bao BK, Xu C (2022) Df-gan: A simple and effective baseline for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16515-16525

  27. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  28. Wang Y, Dantcheva A, Bremond F (2018) From attribute-labels to faces: face generation using a conditional generative adversarial network. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops

  29. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11534–11542

  30. Xia W, Yang Y, Xue JH, Wu B (2021) Tedigan: Text-guided diverse face image generation and manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2256–2265

  31. Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1316–1324

  32. Yin G, Liu B, Sheng L, Yu N, Wang X, Shao J (2019) Semantics disentangling for text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2327–2336

  33. Yuan Z, Zhang J, Shan S, Chen X (2021) Attributes aware face generation with generative adversarial networks. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 1657–1664. IEEE

  34. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2018) Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962

    Article  PubMed  Google Scholar 

  35. Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 5907–5915

  36. Zhu M, Pan P, Chen W, Yang Y (2019) Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5802–5810

Download references

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongxia Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Wang, H. A face template: Improving the face generation quality of multi-stage generative adversarial networks using coarse-grained facial priors. Multimed Tools Appl 83, 21677–21693 (2024). https://doi.org/10.1007/s11042-023-16183-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16183-2

Keywords