Skip to main content
Log in

Text to image synthesis using multi-generator text conditioned generative adversarial networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recently, Generative Adversarial Network(GAN) has been the most mainstream technology in the task of Text to Image. However, the vanilla deep neural networks tend to approximate continuous mappings in real generation tasks rather than discontinuous mappings with discrete points. When training on datasets with multiple types, GAN fails to synthesize diverse images, which we call as mode collapse. To deal with it, we propose the Multi-generator Text Conditioned Generative Adversarial Network (MTC-GAN) in this paper. Textual description of real images is embedded on the noise vector as a constraint. Based on Deep Convolutional Generative Adversarial Networks(DCGAN), multiple generators are incorporated to capture high probability among the target distribution. To identify the generated fake sample from a particular generator, the discriminator must enforce multiple generators to have different identifiable modes. The method based on global constraints can make the generated images more diverse. Multiple generators can improve the particular functional shape of the discriminators indirectly, which should make the GAN more stable when trained in high dimensional spaces. The experimental results on the standard dataset demonstrate the good performance of the proposed method. The problem of mode collapse can be improved, and the generated samples can be more diverse.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp 214–223

  2. Bang D, Shim H (2018) MGGAN: solving mode collapse using manifold guided training, arXiv:1804.04391

  3. Bishop CM (2006) Pattern recognition and machine learning (information science and statistics)

  4. Bodnar C (2018) Text to image synthesis using generative adversarial networks, arXiv:1805.00676

  5. Che T, Li Y, Jacob AP, Bengio Y, Li W (2016) Mode regularized generative adversarial networks, arXiv:1612.02136

  6. Chidambaram M, Qi Y (2017) Style transfer generative adversarial networks: Learning to play chess differently, arXiv:1702.06762

  7. Dash A, Gamboa JCB, Ahmed S, Liwicki M, Afzal MZ (2017) TAC-GAN - text conditioned auxiliary classifier generative adversarial network, arXiv:1703.06412

  8. Ghosh A, Kulharia V, Namboodiri VP, Torr PHS, Dokania PK (2018) Multi-agent diverse generative adversarial networks. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. Salt Lake City, UT, USA, June 18-22, 2018, pp 8513–8521

  9. Goodfellow IJ, Pouget-abadie J, Mirza M, Xu B, Warde-farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp 2672–2680

  10. Guo H, Han L, Su S, Sun Z (2018) Deep multi-instance multi-label learning for image annotation. Int J Pattern Recognit Artif Intell 32(3):1–16

    Article  MathSciNet  Google Scholar 

  11. Han Z, Tao X, Li H, Zhang S, Wang X, Huang X, Metaxas D (2017) Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal & Machine Intell PP(99):1–1

    Google Scholar 

  12. Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp 6629–6640

  13. Li J, Monroe W, Shi T, Jean S, Ritter A, Jurafsky D (2017) Adversarial learning for neural dialogue generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017,Copenhagen, Denmark, pp 2157–2169

  14. Lin Z, Khetan A, Fanti GC, Oh S (2018) Pacgan: The power of two samples in generative adversarial networks. In: Advances in neural information processing systems 31: Annual conference on neural information processing systems 2018, neurIPS 2018, 3-8 December 2018, Montréal, Canada, pp 1505–1514

  15. Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Trans Circuits Syst Video Technol, pp 69–82

  16. Mansimov E, Parisotto E, Ba LJ, Salakhutdinov R (2015) Generating images from captions with attention, arXiv:1511.02793

  17. Metz L, Poole B, Pfau D, Sohl-dickstein J (2016) Unrolled generative adversarial networks. arXiv:1611.02163

  18. Mirza M, Osindero S (2014) Conditional generative adversarial nets. Computer Science, pp 2672–2680

  19. Moradshahi M, Contractor U (2018) Language modeling with generative adversarialnetworks

  20. Nilsback M, Zisserman A (2008) Automated flower classification over a large number of classes. In: Sixth indian conference on computer vision, graphics & image processing, ICVGIP 2008. Bhubaneswar, India, 16-19 December 2008, pp 722–729

  21. Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017. Sydney, NSW, Australia, 6-11 August 2017, pp 2642–2651

  22. van den Oord A, Kalchbrenner N, Espeholt L, Kavukcuoglu K, Vinyals O, Graves A (2016) Conditional image generation with pixelcnn decoders. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain, pp 4790–4798

  23. Welinder P, Branson S, Mita T, Wah C, Schroff F, Branson S, Perona P Caltech-ucsd birds-200-2010 pp 2,5

  24. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: International conference on international conference on machine learning, pp 2,3,5,7,8,9

  25. Reed SE, Akata Z, Mohan S, Tenka S, Schiele B, Lee H (2016) Learning what and where to draw. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016. December 5-10, 2016, Barcelona, Spain, pp 217–225

  26. Reed SE, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016. New York City, NY, USA, June 19-24, 2016, pp 1060–1069

  27. Salimans T, Goodfellow IJ, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016. December 5-10, 2016, Barcelona, Spain, pp 2226–2234

  28. Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, Shen C, Lau RWH, Yang M. (2018) VITAL: visual tracking via adversarial learning. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. Salt Lake City, UT, USA, June 18-22, 2018. IEEE Computer Society, pp 8990–8999

  29. Srivastava A, Valkov L, Russell C, Gutmann MU, Sutton CA (2017) VEEGAN: reducing mode collapse in gans using implicit variational learning. In: Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp 3310–3320

  30. Thanh-Tung H, Tran T, Venkatesh S (2018) On catastrophic forgetting and mode collapse in generative adversarial networks, arXiv:1807.04015

  31. Xiang S, Li H (2017) On the effects of batch and weight normalization in generative adversarial networks

  32. Xu C, Cui Y, Zhang Y, Gao P, Xu J (2019) Person-independent facial expression recognition method based on improved wasserstein generative adversarial networks in combination with identity aware. Multimedia Systems

  33. Zhang H, Xu T, Li H (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: IEEE International conference on computer vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp 5908–5916

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiping Zhou.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, M., Li, C. & Zhou, Z. Text to image synthesis using multi-generator text conditioned generative adversarial networks. Multimed Tools Appl 80, 7789–7803 (2021). https://doi.org/10.1007/s11042-020-09965-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09965-5

Keywords

Navigation