Skip to main content

Hybrid Attention Driven Text-to-Image Synthesis via Generative Adversarial Networks

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11731))

Abstract

With the development of generative models, image synthesis conditioned on the specific variable becomes an important research theme gradually. This paper presents a novel spectral normalization based Hybrid Attentional Generative Adversarial Networks (HAGAN) for text to image synthesis. The hybrid attentional mechanism is composed of text-image cross-modal attention and self-attention of image sub regions. Cross-modal attention mechanism contributes to synthesize more fine-grained and text-related image by introducing word-level semantic information in generative model. The self-attention solves the long distance reliance of image local-region features when generate image. With spectral normalization, the training of GANs become more stable than traditional GANs, which conduces to avoid model collapse and gradient vanishing or explosion. We conduct experiments on widely used Oxford-102 flower dataset and CUB bird dataset to validate our proposed method. During quantitative and non-quantitative experimental comparison, the results indicate that the proposed method achieves the best performance on Inception score (IS), Fréchet Inception Distance (FID) and visual effect.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  2. Huang, H., Yu, P.S., Wang, C.: An introduction to image synthesis with generative adversarial nets. arXiv preprint arXiv:1803.04469 (2018)

  3. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

    Google Scholar 

  4. Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: Proceedings of the 2017 on Multimedia Conference, pp. 154–162. ACM Press, California (2017)

    Google Scholar 

  5. Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. arXiv preprint arXiv:1701.04862 (2017)

  6. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint arXiv:1701.07875 (2017)

  7. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Advances in Neural Information Processing Systems, pp. 5767–5777 (2017)

    Google Scholar 

  8. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018)

  9. Cai, L., Gao, H., Ji, S.: Multi-stage variational auto-encoders for coarse-to-fine image generation. arXiv preprint arXiv:1705.07202 (2017)

  10. Gulrajani, I., et al.: PixeLVAE: a latent variable model for natural images. arXiv preprint arXiv:1611.05013 (2016)

  11. Gregor, K., Danihelka, I., Graves, A., Rezende, D.J., Wierstra, D.: Draw: a recurrent neural network for image generation. In: International Conference on Machine Learning, Lille, pp. 1462–1471 (2015)

    Google Scholar 

  12. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 (2016)

  13. Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915. IEEE Press, Venice (2017)

    Google Scholar 

  14. Han, Z., Tao, X., Hongsheng, L., Shaoting, Z., Xiaogang, W., Xiaolei, H.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 1947–1962 (2018)

    Google Scholar 

  15. Bodnar, C.: Text to image synthesis using generative adversarial networks. arXiv preprint arXiv:1805.00676 (2018)

  16. Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. arXiv preprint arXiv:1711.10485 (2017)

  17. Reed, S.E., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: Advances in Neural Information Processing Systems, pp. 217–225 (2016)

    Google Scholar 

  18. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)

  19. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)

  20. Mansimov, E., Parisotto, E., Ba, J.L., Salakhutdinov, R.: Generating images from captions with attention. In: International Conference on Learning Representations, San Juan (2016)

    Google Scholar 

  21. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 dataset (2011)

    Google Scholar 

  22. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: IEEE Sixth Indian Conference on Computer Vision, Graphics & Image Processing, pp. 722–729. IEEE Press, Bhubaneswar (2008)

    Google Scholar 

  23. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)

    Google Scholar 

  24. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Klambauer, G., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a nash equilibrium. arXiv preprint arXiv:1706.08500 (2017)

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China under grant 61771145 and grant 61371148.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaodong Gu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cheng, Q., Gu, X. (2019). Hybrid Attention Driven Text-to-Image Synthesis via Generative Adversarial Networks. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions. ICANN 2019. Lecture Notes in Computer Science(), vol 11731. Springer, Cham. https://doi.org/10.1007/978-3-030-30493-5_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30493-5_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30492-8

  • Online ISBN: 978-3-030-30493-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics