Abstract
Recently, Generative Adversarial Network(GAN) has been the most mainstream technology in the task of Text to Image. However, the vanilla deep neural networks tend to approximate continuous mappings in real generation tasks rather than discontinuous mappings with discrete points. When training on datasets with multiple types, GAN fails to synthesize diverse images, which we call as mode collapse. To deal with it, we propose the Multi-generator Text Conditioned Generative Adversarial Network (MTC-GAN) in this paper. Textual description of real images is embedded on the noise vector as a constraint. Based on Deep Convolutional Generative Adversarial Networks(DCGAN), multiple generators are incorporated to capture high probability among the target distribution. To identify the generated fake sample from a particular generator, the discriminator must enforce multiple generators to have different identifiable modes. The method based on global constraints can make the generated images more diverse. Multiple generators can improve the particular functional shape of the discriminators indirectly, which should make the GAN more stable when trained in high dimensional spaces. The experimental results on the standard dataset demonstrate the good performance of the proposed method. The problem of mode collapse can be improved, and the generated samples can be more diverse.
Similar content being viewed by others
References
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, pp 214–223
Bang D, Shim H (2018) MGGAN: solving mode collapse using manifold guided training, arXiv:1804.04391
Bishop CM (2006) Pattern recognition and machine learning (information science and statistics)
Bodnar C (2018) Text to image synthesis using generative adversarial networks, arXiv:1805.00676
Che T, Li Y, Jacob AP, Bengio Y, Li W (2016) Mode regularized generative adversarial networks, arXiv:1612.02136
Chidambaram M, Qi Y (2017) Style transfer generative adversarial networks: Learning to play chess differently, arXiv:1702.06762
Dash A, Gamboa JCB, Ahmed S, Liwicki M, Afzal MZ (2017) TAC-GAN - text conditioned auxiliary classifier generative adversarial network, arXiv:1703.06412
Ghosh A, Kulharia V, Namboodiri VP, Torr PHS, Dokania PK (2018) Multi-agent diverse generative adversarial networks. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. Salt Lake City, UT, USA, June 18-22, 2018, pp 8513–8521
Goodfellow IJ, Pouget-abadie J, Mirza M, Xu B, Warde-farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp 2672–2680
Guo H, Han L, Su S, Sun Z (2018) Deep multi-instance multi-label learning for image annotation. Int J Pattern Recognit Artif Intell 32(3):1–16
Han Z, Tao X, Li H, Zhang S, Wang X, Huang X, Metaxas D (2017) Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal & Machine Intell PP(99):1–1
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp 6629–6640
Li J, Monroe W, Shi T, Jean S, Ritter A, Jurafsky D (2017) Adversarial learning for neural dialogue generation. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, September 9-11, 2017,Copenhagen, Denmark, pp 2157–2169
Lin Z, Khetan A, Fanti GC, Oh S (2018) Pacgan: The power of two samples in generative adversarial networks. In: Advances in neural information processing systems 31: Annual conference on neural information processing systems 2018, neurIPS 2018, 3-8 December 2018, Montréal, Canada, pp 1505–1514
Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Trans Circuits Syst Video Technol, pp 69–82
Mansimov E, Parisotto E, Ba LJ, Salakhutdinov R (2015) Generating images from captions with attention, arXiv:1511.02793
Metz L, Poole B, Pfau D, Sohl-dickstein J (2016) Unrolled generative adversarial networks. arXiv:1611.02163
Mirza M, Osindero S (2014) Conditional generative adversarial nets. Computer Science, pp 2672–2680
Moradshahi M, Contractor U (2018) Language modeling with generative adversarialnetworks
Nilsback M, Zisserman A (2008) Automated flower classification over a large number of classes. In: Sixth indian conference on computer vision, graphics & image processing, ICVGIP 2008. Bhubaneswar, India, 16-19 December 2008, pp 722–729
Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017. Sydney, NSW, Australia, 6-11 August 2017, pp 2642–2651
van den Oord A, Kalchbrenner N, Espeholt L, Kavukcuoglu K, Vinyals O, Graves A (2016) Conditional image generation with pixelcnn decoders. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain, pp 4790–4798
Welinder P, Branson S, Mita T, Wah C, Schroff F, Branson S, Perona P Caltech-ucsd birds-200-2010 pp 2,5
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: International conference on international conference on machine learning, pp 2,3,5,7,8,9
Reed SE, Akata Z, Mohan S, Tenka S, Schiele B, Lee H (2016) Learning what and where to draw. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016. December 5-10, 2016, Barcelona, Spain, pp 217–225
Reed SE, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: Proceedings of the 33nd International Conference on Machine Learning, ICML 2016. New York City, NY, USA, June 19-24, 2016, pp 1060–1069
Salimans T, Goodfellow IJ, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016. December 5-10, 2016, Barcelona, Spain, pp 2226–2234
Song Y, Ma C, Wu X, Gong L, Bao L, Zuo W, Shen C, Lau RWH, Yang M. (2018) VITAL: visual tracking via adversarial learning. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018. Salt Lake City, UT, USA, June 18-22, 2018. IEEE Computer Society, pp 8990–8999
Srivastava A, Valkov L, Russell C, Gutmann MU, Sutton CA (2017) VEEGAN: reducing mode collapse in gans using implicit variational learning. In: Advances in neural information processing systems 30: Annual conference on neural information processing systems 2017, 4-9 December 2017, Long Beach, CA, USA, pp 3310–3320
Thanh-Tung H, Tran T, Venkatesh S (2018) On catastrophic forgetting and mode collapse in generative adversarial networks, arXiv:1807.04015
Xiang S, Li H (2017) On the effects of batch and weight normalization in generative adversarial networks
Xu C, Cui Y, Zhang Y, Gao P, Xu J (2019) Person-independent facial expression recognition method based on improved wasserstein generative adversarial networks in combination with identity aware. Multimedia Systems
Zhang H, Xu T, Li H (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: IEEE International conference on computer vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp 5908–5916
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, M., Li, C. & Zhou, Z. Text to image synthesis using multi-generator text conditioned generative adversarial networks. Multimed Tools Appl 80, 7789–7803 (2021). https://doi.org/10.1007/s11042-020-09965-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09965-5