Skip to main content

MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13022))

Included in the following conference series:

  • 1858 Accesses

Abstract

Although generative adversarial networks are commonly used in text-to-image generation tasks and have made great progress, there are still some problems. The convolution operation used in these GANs-based methods works on local regions, but not disjoint regions of the image, leading to structural anomalies in the generated image. Moreover, the semantic consistency of generated images and corresponding text descriptions still needs to be improved. In this paper, we propose a multi-attention generative adversarial networks (MAGAN) for text-to-image generation. We use self-attention mechanism to improve the overall quality of images, so that the target image with a certain structure can also be generated well. We use multi-head attention mechanism to improve the semantic consistency of generated images and text descriptions. We conducted extensive experiments on three datasets: Oxford-102 Flowers dataset, Caltech-UCSD Birds dataset and COCO dataset. Our MAGAN has better results than representative methods such as AttnGAN, MirrorGAN and ControlGAN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Goodfellow, I., Xu, B., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)

    Google Scholar 

  2. Reed, S., Akata, Z., et al.: Generative adversarial text to image synthesis. In: ICML, pp. 1060–1069 (2016)

    Google Scholar 

  3. Cho, K., Gulcehre, C., Schwenk, H., et al.: Learning phrase representations using RNN encoder-decoder for statistical ma-chine translation. In: EMNLP (2014)

    Google Scholar 

  4. Reed, S., Akata, Z., et al.: Learning what and where to draw. In: NIPS, pp. 217–225 (2016)

    Google Scholar 

  5. Zhang, H., Xu, T., Li, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV, pp. 5907–5915 (2017)

    Google Scholar 

  6. Zhang, H., Xu, T., Li, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. In: TPAMI, pp. 1947–1962 (2018)

    Google Scholar 

  7. Xu, T., Zhang, P., Huang, Q., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: CVPR, pp. 1316–1324 (2018)

    Google Scholar 

  8. Qiao, T., Zhang, J., Xu, D., et al.: MirrorGAN: learning text-to-image generation by redescription. In: CVPR, pp. 1505–1514 (2019)

    Google Scholar 

  9. Li, B., Qi, X., et al.: Controllable text-to-image generation. In: NIPS, pp. 2065–2075 (2019)

    Google Scholar 

  10. Qiao, T., Zhang, J., et al.: Learn, imagine and create: text-to-image generation from prior knowledge. In: NIPS, pp. 885–895 (2019)

    Google Scholar 

  11. Zhang, Z., Xie, Y., Yang, L.: Photographic text-to-image synthesis with a hierarchical-ly-nested adversarial network. In: CVPR, pp. 6199–6208 (2018)

    Google Scholar 

  12. Zhu, M., Pan, P., Chen, W., et al.: DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis. In: CVPR, pp. 5802–5810 (2019)

    Google Scholar 

  13. Shazeer, N., Jones, L., et al.: Attention is all you need. In: NIPS, pp. 5998–6008 (2017)

    Google Scholar 

  14. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)

  15. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)

    Google Scholar 

  16. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)

    Google Scholar 

  17. Xu, K., Ba, J., Kiros, R., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)

    Google Scholar 

  18. Zhang, H., Goodfellow, I., Metaxas, D., et al.: Self-attention generative adversarial networks. In: ICML, pp. 7354–7363 (2019)

    Google Scholar 

  19. Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. In: IJCV, pp. 211–252 (2015)

    Google Scholar 

  20. Cao, Y., Xu, J., Lin, S., et al.: GCNet: non-local networks meet squeeze-excitation networks and beyond. In: ICCV (2019)

    Google Scholar 

  21. Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: CVPR (2018)

    Google Scholar 

  22. Wah, C., Branson, P., Welinder, P., et al.: The Caltech-UCSD Birds-200-2011 Da-taset. California Institute of Technology, Technical Report CNS-TR-2011-001 (2011)

    Google Scholar 

  23. Nilsback, M., Zisserman, A.: Automated flower classifification over a large number of classes. In: ICVGIP, pp. 722–729 (2008)

    Google Scholar 

  24. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  25. Salimanx, T., Goodfellow, I., Zaremba, W., et al.: Improved techniques for training GANs. In: NIPS, pp. 2226–2234 (2016)

    Google Scholar 

  26. Heusel, M., Ramsauer, H., et al.: GANs trained by a two time-scale update rule con-verge to a local nash equilibrium. In: NIPS, pp. 6626–6637 (2017)

    Google Scholar 

  27. Szegedy, C., Ioffe, S., Shlens, J., et al.: Rethinking the inception architecture for com-puter vision. In: CVPR, pp. 2818–2826 (2016)

    Google Scholar 

  28. Kingma. D., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

Download references

Acknowledgement

This work is supported by Beijing Natural Science Foundation under No. 4202004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Dai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jia, X., Mi, Q., Dai, Q. (2021). MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13022. Springer, Cham. https://doi.org/10.1007/978-3-030-88013-2_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88013-2_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88012-5

  • Online ISBN: 978-3-030-88013-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics