Abstract
In recent years, text-to-image synthesis techniques have made considerable breakthroughs, but the progress is restricted to simple scenes. Such techniques turn out to be ineffective if the text appears complex and contains multiple objects. To address this challenging issue, we propose a novel text-to-image synthesis model called Object-driven Self-Attention Generative Adversarial Network (Obj-SA-GAN), where self-attention mechanisms are utilised to analyse the information with different granularities at different stages, achieving full exploitation of text semantic information from coarse to fine. Complex datasets are used to evaluate the performance of the proposed model. The experimental results explicitly show that our model outperforms the state-of-the-art methods. This is because the proposed Obj-SA-GAN model utilises textual information, which provides a better understanding of complex scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agnese, J., Herrera, J., Tao, H., Zhu, X.: A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdisc. Rev.: Data Mining Knowl. Discovery 10(4), e1345 (2020)
Bai, S., An, S.: A survey on automatic image caption generation. Neurocomputing 311, 291â304 (2018)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248â255. IEEE (2009)
Esfahani, S.N., Latifi, S.: Image generation with gans-based techniques: A survey. Int. J. of Comput. Sci. Inf. Technol. 11, 33â50 (10 2019). https://doi.org/10.5121/ijcsit.2019.11503
Frolov, S., Hinz, T., Raue, F., Hees, J., Dengel, A.: Adversarial text-to-image synthesis: A review. Neural Netw. 144, 187â209 (2021)
Ghosh, B., Dutta, I.K., Totaro, M., Bayoumi, M.: A survey on the progression and performance of generative adversarial networks. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1â8. IEEE (2020)
Hong, S., Yang, D., Choi, J., Lee, H.: Inferring semantic layout for hierarchical text-to-image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7986â7994 (2018)
Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern recognition, pp. 5077â5086 (2017)
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128â3137 (2015)
Lee, H., Ullah, U., Lee, J.S., Jeong, B., Choi, H.C.: A brief survey of text driven image generation and maniulation. In: 2021 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 1â4. IEEE (2021)
Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12174â12182 (2019)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740â755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Ning, X., Nan, F., Xu, S., Yu, L., Zhang, L.: Multi-view frontal face image generation: a survey. Concurrency and Computation: Practice and Experience, p. e6147 (2020)
Pavan Kumar, M., Jayagopal, P.: Generative adversarial networks: a survey on applications and challenges. Int. J. Multimedia Inform. Retrieval 10(1), 1â24 (2021)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International conference on machine learning, pp. 1060â1069. PMLR (2016)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060â1069. PMLR (2016)
Shamsolmoali, P., et al.: Image synthesis with adversarial networks: a comprehensive survey and case studies. Inform. Fusion 72, 126â146 (2021)
Singh, N.K., Raza, K.: Medical image generation using generative adversarial networks: a review. Health Informatics: A Computational Perspective in Healthcare, pp. 77â96 (2021)
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, vol. 30 (2017)
Wang, F., Tax, D.M.: Survey on the attention based rnn model and its applications in computer vision. arXiv preprint arXiv:1601.06823 (2016)
Wu, X., Xu, K., Hall, P.: A survey of image synthesis and editing with generative adversarial networks. Tsinghua Sci. Technol. 22(6), 660â674 (2017)
Xu, T., et al.: Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1316â1324 (2018)
Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. In: International Conference on Machine Learning, pp. 7354â7363. PMLR (2019)
Zhang, H., et al.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907â5915 (2017)
Zhang, H., et al.: Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947â1962 (2018)
Zhang, S., et al.: Text-to-image synthesis via visual-memory creative adversarial network. In: Hong, R., Cheng, W.-H., Yamasaki, T., Wang, M., Ngo, C.-W. (eds.) PCM 2018. LNCS, vol. 11166, pp. 417â427. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00764-5_38
Zhou, R., Jiang, C., Xu, Q.: A survey on generative adversarial network-based text-to-image synthesis. Neurocomputing 451, 316â336 (2021)
Zhou, Y., Shimada, N.: Generative adversarial network for text-to-face synthesis and manipulation with pretrained bert model. In: 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 01â08. IEEE (2021)
Zhu, B., Ngo, C.W.: Cookgan: Causality based text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5519â5527 (2020)
Zhu, M., Pan, P., Chen, W., Yang, Y.: Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802â5810 (2019)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Âİ 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, R., Li, W., Yang, Y., Bai, Q. (2022). Obj-SA-GAN: Object-Driven Text-to-Image Synthesis with Self-Attention Based Full Semantic Information Mining. In: Khanna, S., Cao, J., Bai, Q., Xu, G. (eds) PRICAI 2022: Trends in Artificial Intelligence. PRICAI 2022. Lecture Notes in Computer Science, vol 13629. Springer, Cham. https://doi.org/10.1007/978-3-031-20862-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-031-20862-1_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20861-4
Online ISBN: 978-3-031-20862-1
eBook Packages: Computer ScienceComputer Science (R0)