Skip to main content

Obj-SA-GAN: Object-Driven Text-to-Image Synthesis with Self-Attention Based Full Semantic Information Mining

  • Conference paper
  • First Online:
  • 1198 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13629))

Abstract

In recent years, text-to-image synthesis techniques have made considerable breakthroughs, but the progress is restricted to simple scenes. Such techniques turn out to be ineffective if the text appears complex and contains multiple objects. To address this challenging issue, we propose a novel text-to-image synthesis model called Object-driven Self-Attention Generative Adversarial Network (Obj-SA-GAN), where self-attention mechanisms are utilised to analyse the information with different granularities at different stages, achieving full exploitation of text semantic information from coarse to fine. Complex datasets are used to evaluate the performance of the proposed model. The experimental results explicitly show that our model outperforms the state-of-the-art methods. This is because the proposed Obj-SA-GAN model utilises textual information, which provides a better understanding of complex scenarios.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Agnese, J., Herrera, J., Tao, H., Zhu, X.: A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdisc. Rev.: Data Mining Knowl. Discovery 10(4), e1345 (2020)

    Google Scholar 

  2. Bai, S., An, S.: A survey on automatic image caption generation. Neurocomputing 311, 291–304 (2018)

    Article  Google Scholar 

  3. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  4. Esfahani, S.N., Latifi, S.: Image generation with gans-based techniques: A survey. Int. J. of Comput. Sci. Inf. Technol. 11, 33–50 (10 2019). https://doi.org/10.5121/ijcsit.2019.11503

  5. Frolov, S., Hinz, T., Raue, F., Hees, J., Dengel, A.: Adversarial text-to-image synthesis: A review. Neural Netw. 144, 187–209 (2021)

    Article  Google Scholar 

  6. Ghosh, B., Dutta, I.K., Totaro, M., Bayoumi, M.: A survey on the progression and performance of generative adversarial networks. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–8. IEEE (2020)

    Google Scholar 

  7. Hong, S., Yang, D., Choi, J., Lee, H.: Inferring semantic layout for hierarchical text-to-image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7986–7994 (2018)

    Google Scholar 

  8. Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, pp. 5077–5086 (2017)

    Google Scholar 

  9. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)

    Google Scholar 

  10. Lee, H., Ullah, U., Lee, J.S., Jeong, B., Choi, H.C.: A brief survey of text driven image generation and maniulation. In: 2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 1–4. IEEE (2021)

    Google Scholar 

  11. Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12174–12182 (2019)

    Google Scholar 

  12. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  13. Ning, X., Nan, F., Xu, S., Yu, L., Zhang, L.: Multi-view frontal face image generation: a survey. Concurrency and Computation: Practice and Experience, p. e6147 (2020)

    Google Scholar 

  14. Pavan Kumar, M., Jayagopal, P.: Generative adversarial networks: a survey on applications and challenges. Int. J. Multimedia Inform. Retrieval 10(1), 1–24 (2021)

    Article  Google Scholar 

  15. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International conference on machine learning, pp. 1060–1069. PMLR (2016)

    Google Scholar 

  16. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)

    Google Scholar 

  17. Shamsolmoali, P., et al.: Image synthesis with adversarial networks: a comprehensive survey and case studies. Inform. Fusion 72, 126–146 (2021)

    Article  Google Scholar 

  18. Singh, N.K., Raza, K.: Medical image generation using generative adversarial networks: a review. Health Informatics: A Computational Perspective in Healthcare, pp. 77–96 (2021)

    Google Scholar 

  19. Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, vol. 30 (2017)

    Google Scholar 

  20. Wang, F., Tax, D.M.: Survey on the attention based rnn model and its applications in computer vision. arXiv preprint arXiv:1601.06823 (2016)

  21. Wu, X., Xu, K., Hall, P.: A survey of image synthesis and editing with generative adversarial networks. Tsinghua Sci. Technol. 22(6), 660–674 (2017)

    Article  MATH  Google Scholar 

  22. Xu, T., et al.: Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1316–1324 (2018)

    Google Scholar 

  23. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354–7363. PMLR (2019)

    Google Scholar 

  24. Zhang, H., et al.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)

    Google Scholar 

  25. Zhang, H., et al.: Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2018)

    Article  Google Scholar 

  26. Zhang, S., et al.: Text-to-image synthesis via visual-memory creative adversarial network. In: Hong, R., Cheng, W.-H., Yamasaki, T., Wang, M., Ngo, C.-W. (eds.) PCM 2018. LNCS, vol. 11166, pp. 417–427. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00764-5_38

    Chapter  Google Scholar 

  27. Zhou, R., Jiang, C., Xu, Q.: A survey on generative adversarial network-based text-to-image synthesis. Neurocomputing 451, 316–336 (2021)

    Article  Google Scholar 

  28. Zhou, Y., Shimada, N.: Generative adversarial network for text-to-face synthesis and manipulation with pretrained bert model. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 01–08. IEEE (2021)

    Google Scholar 

  29. Zhu, B., Ngo, C.W.: Cookgan: Causality based text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5519–5527 (2020)

    Google Scholar 

  30. Zhu, M., Pan, P., Chen, W., Yang, Y.: Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5810 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ruijun Li or Weihua Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, R., Li, W., Yang, Y., Bai, Q. (2022). Obj-SA-GAN: Object-Driven Text-to-Image Synthesis with Self-Attention Based Full Semantic Information Mining. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13629. Springer, Cham. https://doi.org/10.1007/978-3-031-20862-1_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20862-1_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20861-4

  • Online ISBN: 978-3-031-20862-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics