Text to image synthesis using multi-generator text conditioned generative adversarial networks

Zhang, Min; Li, Chunye; Zhou, Zhiping

doi:10.1007/s11042-020-09965-5

Text to image synthesis using multi-generator text conditioned generative adversarial networks

Published: 30 October 2020

Volume 80, pages 7789–7803, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Min Zhang¹,
Chunye Li¹ &
Zhiping Zhou^1,2

566 Accesses
9 Citations
Explore all metrics

Abstract

Recently, Generative Adversarial Network(GAN) has been the most mainstream technology in the task of Text to Image. However, the vanilla deep neural networks tend to approximate continuous mappings in real generation tasks rather than discontinuous mappings with discrete points. When training on datasets with multiple types, GAN fails to synthesize diverse images, which we call as mode collapse. To deal with it, we propose the Multi-generator Text Conditioned Generative Adversarial Network (MTC-GAN) in this paper. Textual description of real images is embedded on the noise vector as a constraint. Based on Deep Convolutional Generative Adversarial Networks(DCGAN), multiple generators are incorporated to capture high probability among the target distribution. To identify the generated fake sample from a particular generator, the discriminator must enforce multiple generators to have different identifiable modes. The method based on global constraints can make the generated images more diverse. Multiple generators can improve the particular functional shape of the discriminators indirectly, which should make the GAN more stable when trained in high dimensional spaces. The experimental results on the standard dataset demonstrate the good performance of the proposed method. The problem of mode collapse can be improved, and the generated samples can be more diverse.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

Article 28 March 2024

TRGAN: Text to Image Generation Through Optimizing Initial Image

MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation

References

Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp 214–223
Bang D, Shim H (2018) MGGAN: solving mode collapse using manifold guided training, arXiv:1804.04391
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics)
Bodnar C (2018) Text to image synthesis using generative adversarial networks, arXiv:1805.00676
Che T, Li Y, Jacob AP, Bengio Y, Li W (2016) Mode regularized generative adversarial networks, arXiv:1612.02136
Chidambaram M, Qi Y (2017) Style transfer generative adversarial networks: Learning to play chess differently, arXiv:1702.06762
Dash A, Gamboa JCB, Ahmed S, Liwicki M, Afzal MZ (2017) TAC-GAN - text conditioned auxiliary classifier generative adversarial network, arXiv:1703.06412
Ghosh A, Kulharia V, Namboodiri VP, Torr PHS, Dokania PK (2018) Multi-agent diverse generative adversarial networks. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. Salt Lake City, UT, USA, June 18-22, 2018, pp 8513–8521
Goodfellow IJ, Pouget-abadie J, Mirza M, Xu B, Warde-farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp 2672–2680
Guo H, Han L, Su S, Sun Z (2018) Deep multi-instance multi-label learning for image annotation. Int J Pattern Recognit Artif Intell 32(3):1–16
Article MathSciNet Google Scholar
Han Z, Tao X, Li H, Zhang S, Wang X, Huang X, Metaxas D (2017) Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal & Machine Intell PP(99):1–1
Google Scholar
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp 6629–6640
Li J, Monroe W, Shi T, Jean S, Ritter A, Jurafsky D (2017) Adversarial learning for neural dialogue generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017,Copenhagen, Denmark, pp 2157–2169
Lin Z, Khetan A, Fanti GC, Oh S (2018) Pacgan: The power of two samples in generative adversarial networks. In: Advances in neural information processing systems 31: Annual conference on neural information processing systems 2018, neurIPS 2018, 3-8 December 2018, Montréal, Canada, pp 1505–1514
Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Trans Circuits Syst Video Technol, pp 69–82
Mansimov E, Parisotto E, Ba LJ, Salakhutdinov R (2015) Generating images from captions with attention, arXiv:1511.02793
Metz L, Poole B, Pfau D, Sohl-dickstein J (2016) Unrolled generative adversarial networks. arXiv:1611.02163
Mirza M, Osindero S (2014) Conditional generative adversarial nets. Computer Science, pp 2672–2680
Moradshahi M, Contractor U (2018) Language modeling with generative adversarialnetworks
Nilsback M, Zisserman A (2008) Automated flower classification over a large number of classes. In: Sixth indian conference on computer vision, graphics & image processing, ICVGIP 2008. Bhubaneswar, India, 16-19 December 2008, pp 722–729
Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017. Sydney, NSW, Australia, 6-11 August 2017, pp 2642–2651
van den Oord A, Kalchbrenner N, Espeholt L, Kavukcuoglu K, Vinyals O, Graves A (2016) Conditional image generation with pixelcnn decoders. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain, pp 4790–4798
Welinder P, Branson S, Mita T, Wah C, Schroff F, Branson S, Perona P Caltech-ucsd birds-200-2010 pp 2,5
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: International conference on international conference on machine learning, pp 2,3,5,7,8,9
Reed SE, Akata Z, Mohan S, Tenka S, Schiele B, Lee H (2016) Learning what and where to draw. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016. December 5-10, 2016, Barcelona, Spain, pp 217–225
Reed SE, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016. New York City, NY, USA, June 19-24, 2016, pp 1060–1069
Salimans T, Goodfellow IJ, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016. December 5-10, 2016, Barcelona, Spain, pp 2226–2234
Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, Shen C, Lau RWH, Yang M. (2018) VITAL: visual tracking via adversarial learning. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. Salt Lake City, UT, USA, June 18-22, 2018. IEEE Computer Society, pp 8990–8999
Srivastava A, Valkov L, Russell C, Gutmann MU, Sutton CA (2017) VEEGAN: reducing mode collapse in gans using implicit variational learning. In: Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp 3310–3320
Thanh-Tung H, Tran T, Venkatesh S (2018) On catastrophic forgetting and mode collapse in generative adversarial networks, arXiv:1807.04015
Xiang S, Li H (2017) On the effects of batch and weight normalization in generative adversarial networks
Xu C, Cui Y, Zhang Y, Gao P, Xu J (2019) Person-independent facial expression recognition method based on improved wasserstein generative adversarial networks in combination with identity aware. Multimedia Systems
Zhang H, Xu T, Li H (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: IEEE International conference on computer vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp 5908–5916

Download references

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

School of Internet of Things Engineering, Jiangnan University, Wuxi, 214122, People’s Republic of China
Min Zhang, Chunye Li & Zhiping Zhou
Engineering Research Center of Internet of Things Technology Applications Ministry of Education, Jiangnan University, Wuxi, 214122, People’s Republic of China
Zhiping Zhou

Authors

Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chunye Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhiping Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiping Zhou.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, M., Li, C. & Zhou, Z. Text to image synthesis using multi-generator text conditioned generative adversarial networks. Multimed Tools Appl 80, 7789–7803 (2021). https://doi.org/10.1007/s11042-020-09965-5

Download citation

Received: 21 May 2020
Revised: 25 August 2020
Accepted: 22 September 2020
Published: 30 October 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11042-020-09965-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text to image synthesis using multi-generator text conditioned generative adversarial networks

Abstract

Access this article

Similar content being viewed by others

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

TRGAN: Text to Image Generation Through Optimizing Initial Image

MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Ethical approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text to image synthesis using multi-generator text conditioned generative adversarial networks

Abstract

Access this article

Similar content being viewed by others

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

TRGAN: Text to Image Generation Through Optimizing Initial Image

MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Ethical approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation