Obj-SA-GAN: Object-Driven Text-to-Image Synthesis with Self-Attention Based Full Semantic Information Mining

Li, Ruijun; Li, Weihua; Yang, Yi; Bai, Quan

doi:10.1007/978-3-031-20862-1_25

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13629))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

1498 Accesses

Abstract

In recent years, text-to-image synthesis techniques have made considerable breakthroughs, but the progress is restricted to simple scenes. Such techniques turn out to be ineffective if the text appears complex and contains multiple objects. To address this challenging issue, we propose a novel text-to-image synthesis model called Object-driven Self-Attention Generative Adversarial Network (Obj-SA-GAN), where self-attention mechanisms are utilised to analyse the information with different granularities at different stages, achieving full exploitation of text semantic information from coarse to fine. Complex datasets are used to evaluate the performance of the proposed model. The experimental results explicitly show that our model outperforms the state-of-the-art methods. This is because the proposed Obj-SA-GAN model utilises textual information, which provides a better understanding of complex scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

References

Agnese, J., Herrera, J., Tao, H., Zhu, X.: A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdisc. Rev.: Data Mining Knowl. Discovery 10(4), e1345 (2020)
Google Scholar
Bai, S., An, S.: A survey on automatic image caption generation. Neurocomputing 311, 291–304 (2018)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Esfahani, S.N., Latifi, S.: Image generation with gans-based techniques: A survey. Int. J. of Comput. Sci. Inf. Technol. 11, 33–50 (10 2019). https://doi.org/10.5121/ijcsit.2019.11503
Frolov, S., Hinz, T., Raue, F., Hees, J., Dengel, A.: Adversarial text-to-image synthesis: A review. Neural Netw. 144, 187–209 (2021)
Article Google Scholar
Ghosh, B., Dutta, I.K., Totaro, M., Bayoumi, M.: A survey on the progression and performance of generative adversarial networks. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–8. IEEE (2020)
Google Scholar
Hong, S., Yang, D., Choi, J., Lee, H.: Inferring semantic layout for hierarchical text-to-image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7986–7994 (2018)
Google Scholar
Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, pp. 5077–5086 (2017)
Google Scholar
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)
Google Scholar
Lee, H., Ullah, U., Lee, J.S., Jeong, B., Choi, H.C.: A brief survey of text driven image generation and maniulation. In: 2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 1–4. IEEE (2021)
Google Scholar
Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12174–12182 (2019)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Ning, X., Nan, F., Xu, S., Yu, L., Zhang, L.: Multi-view frontal face image generation: a survey. Concurrency and Computation: Practice and Experience, p. e6147 (2020)
Google Scholar
Pavan Kumar, M., Jayagopal, P.: Generative adversarial networks: a survey on applications and challenges. Int. J. Multimedia Inform. Retrieval 10(1), 1–24 (2021)
Article Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International conference on machine learning, pp. 1060–1069. PMLR (2016)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Google Scholar
Shamsolmoali, P., et al.: Image synthesis with adversarial networks: a comprehensive survey and case studies. Inform. Fusion 72, 126–146 (2021)
Article Google Scholar
Singh, N.K., Raza, K.: Medical image generation using generative adversarial networks: a review. Health Informatics: A Computational Perspective in Healthcare, pp. 77–96 (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, vol. 30 (2017)
Google Scholar
Wang, F., Tax, D.M.: Survey on the attention based rnn model and its applications in computer vision. arXiv preprint arXiv:1601.06823 (2016)
Wu, X., Xu, K., Hall, P.: A survey of image synthesis and editing with generative adversarial networks. Tsinghua Sci. Technol. 22(6), 660–674 (2017)
Article MATH Google Scholar
Xu, T., et al.: Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1316–1324 (2018)
Google Scholar
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019)
Google Scholar
Zhang, H., et al.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Google Scholar
Zhang, H., et al.: Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2018)
Article Google Scholar
Zhang, S., et al.: Text-to-image synthesis via visual-memory creative adversarial network. In: Hong, R., Cheng, W.-H., Yamasaki, T., Wang, M., Ngo, C.-W. (eds.) PCM 2018. LNCS, vol. 11166, pp. 417–427. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00764-5_38
Chapter Google Scholar
Zhou, R., Jiang, C., Xu, Q.: A survey on generative adversarial network-based text-to-image synthesis. Neurocomputing 451, 316–336 (2021)
Article Google Scholar
Zhou, Y., Shimada, N.: Generative adversarial network for text-to-face synthesis and manipulation with pretrained bert model. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 01–08. IEEE (2021)
Google Scholar
Zhu, B., Ngo, C.W.: Cookgan: Causality based text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5519–5527 (2020)
Google Scholar
Zhu, M., Pan, P., Chen, W., Yang, Y.: Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5810 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Auckland University of Technology, Auckland, 1010, New Zealand
Ruijun Li & Weihua Li
Hefei University of Technology, Hefei, 230601, China
Yi Yang
University of Tasmania, Hobart, 7005, Australia
Quan Bai

Authors

Ruijun Li
View author publications
You can also search for this author in PubMed Google Scholar
Weihua Li
View author publications
You can also search for this author in PubMed Google Scholar
Yi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Quan Bai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ruijun Li or Weihua Li .

Editor information

Editors and Affiliations

CSIRO Australian e-Health Research Centre, Brisbane, QLD, Australia
Sankalp Khanna
Shanghai Jiao Tong University, Shanghai, China
Jian Cao
University of Tasmania, Hobart, TAS, Australia
Quan Bai
University of Technology Sydney, Sydney, NSW, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, R., Li, W., Yang, Y., Bai, Q. (2022). Obj-SA-GAN: Object-Driven Text-to-Image Synthesis with Self-Attention Based Full Semantic Information Mining. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13629. Springer, Cham. https://doi.org/10.1007/978-3-031-20862-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-20862-1_25
Published: 04 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20861-4
Online ISBN: 978-3-031-20862-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Obj-SA-GAN: Object-Driven Text-to-Image Synthesis with Self-Attention Based Full Semantic Information Mining

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Obj-SA-GAN: Object-Driven Text-to-Image Synthesis with Self-Attention Based Full Semantic Information Mining

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation