SS-GANs: Text-to-Image via Stage by Stage Generative Adversarial Networks

Tian, Ming; Xue, Yuting; Tian, Chunna; Wang, Lei; Deng, Donghu; Wei, Wei

doi:10.1007/978-3-030-31723-2_40

SS-GANs: Text-to-Image via Stage by Stage Generative Adversarial Networks

Ming Tian¹⁶,
Yuting Xue¹⁶,
Chunna Tian¹⁶,
Lei Wang¹⁶,
Donghu Deng¹⁷ &
…
Wei Wei¹⁸

Conference paper
First Online: 31 October 2019

2457 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11858))

Abstract

Realistic text-to-image synthesis has achieved great improvements in recent years. However, most work ignores the relationship between low and high resolution and prefers to adopt identical module in different stages. It is obviously inappropriate because the differences in various generation stages are huge. Therefore, we propose a novel structure of network named SS-GANs, in which specific modules are added in different stages to satisfy the unique requirements. In addition, we also explore an effective training way named coordinated train and a simple negative sample selection mechanism. Lastly, we train our model on Oxford-102 dataset, which outperforms the state-of-the-art models.

The first author Ming Tian is master candidate.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)
Cha, M., Gwon, Y.L., Kung, H.T.: Adversarial learning of semantic relevance in text to image synthesis. In: The Thirty-Third Conference on Artificial Intelligence, pp. 3272–3279 (2019)
Google Scholar
Dash, A., Gamboa, J.C.B., Ahmed, S., Liwicki, M., Afzal, M.Z.: TAC-GAN-text conditioned auxiliary classifier generative adversarial network. arXiv preprint arXiv:1703.06412 (2017)
Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)
Google Scholar
Faghri, F., Fleet, D.J., Kiros, J.R., Fidler, S.: VSE++: improving visual-semantic embeddings with hard negatives. In: British Machine Vision Conference, p. 12 (2018)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5767–5777 (2017)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z., Smolley, S.P.: Least squares generative adversarial networks. In: IEEE International Conference on Computer Vision, pp. 2813–2821 (2017)
Google Scholar
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)
Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729. IEEE (2008)
Google Scholar
Reed, S., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–58 (2016)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset. California Institute of Technology (2011)
Google Scholar
Wang, C., Chang, X., Xin, Y., Tao, D.: Evolutionary generative adversarial networks. IEEE Trans. Evol. Comput. (99), 1 (2018)
Google Scholar
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Google Scholar
Yuan, M., Peng, Y.: Text-to-image synthesis via symmetrical distillation networks. arXiv preprint arXiv:1808.06801 (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)
Zhang, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. arXiv preprint arXiv:1710.10916 (2017)
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the National Natural Science Foundation of China under Grants 61571354 and 61671385. In part by China Post doctoral Science Foundation under Grant 158201.

Author information

Authors and Affiliations

VIPS Lab, School of Electronic Engineering, Xidian University, Xian, China
Ming Tian, Yuting Xue, Chunna Tian & Lei Wang
Troops 95841 of PLA, Jiuquan, China
Donghu Deng
School of Computer Science, Northwestern Polytechnical University, Xian, China
Wei Wei

Authors

Ming Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yuting Xue
View author publications
You can also search for this author in PubMed Google Scholar
Chunna Tian
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Donghu Deng
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunna Tian .

Editor information

Editors and Affiliations

School of EECS, Peking University, Beijing, China
Zhouchen Lin
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Liang Wang
Nanjing University of Science and Technology, Nanjing, China
Jian Yang
Xidian University, Xi'an, China
Guangming Shi
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Institute of Artificial Intelligence, Xi'an Jiaotong University, Xi'an, China
Nanning Zheng
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Northwestern Polytechnical University, Xi'an, China
Yanning Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, M., Xue, Y., Tian, C., Wang, L., Deng, D., Wei, W. (2019). SS-GANs: Text-to-Image via Stage by Stage Generative Adversarial Networks. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2019. Lecture Notes in Computer Science(), vol 11858. Springer, Cham. https://doi.org/10.1007/978-3-030-31723-2_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-31723-2_40
Published: 31 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31722-5
Online ISBN: 978-3-030-31723-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics