BA-GAN: Bidirectional Attention Generation Adversarial Network for Text-to-Image Synthesis

Yang, Ting; Tian, Xiaolin; Jia, Nan; Gao, Yuan; Jiao, Licheng

doi:10.1007/978-3-031-14903-0_16

Ting Yang¹⁸,
Xiaolin Tian¹⁸,
Nan Jia¹⁸,
Yuan Gao¹⁸ &
…
Licheng Jiao¹⁸

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 659))

Included in the following conference series:

International Conference on Intelligence Science

1159 Accesses

Abstract

It is difficult for the generated image to maintain semantic consistency with the text descriptions of natural language, which is a challenge of text-to-image generation. A bidirectional attention generation adversarial network (BA-GAN) is proposed in this paper. The network achieves bidirectional attention multi-modal similarity model, which establishes the one-to-one correspondence between text and image through mutual learning. The mutual learning involves the relationship between sentences and images, and between words in the sentences and sub-regions in images. Meanwhile, a deep attention fusion structure is constructed to generate a more real and reliable image. The structure uses multi branch to obtain the fused deep features and improves the generator’s ability to extract text semantic features. A large number of experiments show that the performance of our model has been significantly improved.

The work is supported by the National Natural Science Foundation of China (No. 61977052).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Hybrid Attention Driven Text-to-Image Synthesis via Generative Adversarial Networks

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

References

Bissoto, A., Valle, E., Avila, S.: The six fronts of the generative adversarial networks. arXiv preprint arXiv:1910.13076 (2019)
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Juang, B.H., Hou, W., Lee, C.H.: Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process. 5(3), 257–265 (1997)
Article Google Scholar
Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12174–12182 (2019)
Google Scholar
Liu, S., Huang, D., Wang, Y.: Receptive field block net for accurate and fast object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 404–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_24
Chapter Google Scholar
Qiao, T., Zhang, J., Xu, D., Tao, D.: Mirrorgan: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Google Scholar
Reed, S.E., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Xu, T., et al.: Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Google Scholar
Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Google Scholar
Zhang, H., et al.: Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Artificial Intelligence, Xidian University, Xi’an, 710071, China
Ting Yang, Xiaolin Tian, Nan Jia, Yuan Gao & Licheng Jiao

Authors

Ting Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolin Tian
View author publications
You can also search for this author in PubMed Google Scholar
Nan Jia
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Gao
View author publications
You can also search for this author in PubMed Google Scholar
Licheng Jiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaolin Tian .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Zhongzhi Shi
Department of Computer Science, University of Surrey, Guildford, UK
Yaochu Jin
College of Artificial Intelligence, Xidian University, Xi’an, China
Xiangrong Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, T., Tian, X., Jia, N., Gao, Y., Jiao, L. (2022). BA-GAN: Bidirectional Attention Generation Adversarial Network for Text-to-Image Synthesis. In: Shi, Z., Jin, Y., Zhang, X. (eds) Intelligence Science IV. ICIS 2022. IFIP Advances in Information and Communication Technology, vol 659. Springer, Cham. https://doi.org/10.1007/978-3-031-14903-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-14903-0_16
Published: 19 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14902-3
Online ISBN: 978-3-031-14903-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

BA-GAN: Bidirectional Attention Generation Adversarial Network for Text-to-Image Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid Attention Driven Text-to-Image Synthesis via Generative Adversarial Networks

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

BA-GAN: Bidirectional Attention Generation Adversarial Network for Text-to-Image Synthesis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Hybrid Attention Driven Text-to-Image Synthesis via Generative Adversarial Networks

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation