Visual content generation from textual description using improved adversarial network

Singh, Varsha; Tiwary, Uma Shanker

doi:10.1007/s11042-022-13720-3

Visual content generation from textual description using improved adversarial network

Published: 15 September 2022

Volume 82, pages 10943–10960, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Varsha Singh¹ &
Uma Shanker Tiwary¹

300 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

This paper presents an improved adversarial network for visual content generation from textual description. Synthesizing high-quality images from the textual description is the most challenging problem in Computer vision. Existing methods first generate the initial image sketch and then refine that to fine-grained details at different portions of that image. Mostly available text to image generation methods and approaches nearly reflect the meaning of a given text description. But have not successfully generated details and different parts of the objects. As these methods depend on (1) the initial generated image. If the initial image is not generated correctly, the process fails to generate the fine-grained image with details. (2) According to the image’s content, each word has a different level of importance; however, similar text representation is used even for different image contents. Here, an improved Adversarial Network based on hyper-parameter optimization to generate fine-grained images is proposed. Inception Score (IS), t-Distributed Stochastic Neighbor Embedding (TSNE) and R-precision as a metric is used to evaluate and refine the initial image automatically. An attention mechanism is used to pay attention to more valuable words of text description to generate more refined sub-parts of the image. For which an attentional module is used to calculate the matching loss of image-text for generator training. The proposed model has been evaluated on the Caltech-UCSD Birds 200 dataset. Results using Inception score, R-precision, and TSNE matrix shows the model performs favourably against state of the art approaches ATT-GAN (51) and DM-GAN (61) improving by 25.72% and 19.37% respectively in terms of Inception score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text to Image Synthesis Using Stacked Conditional Variational Autoencoders and Conditional Generative Adversarial Networks

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

Article 28 March 2024

Usage of Generative Adversarial Network to Improve Text to Image Synthesis

Data Availability

Work is still in progress, data is not available publicly.

References

Abbood SH, Abdull Hamed HN, Mohd Rahim MS, Alaidi AHM, Salim ALRikabi HT (2022) Dr-ll gan: Diabetic retinopathy lesions synthesis using generative adversarial network. International Journal of Online & Biomedical Engineering 18(3)
Aggarwal A, Alshehri M, Kumar M, Sharma P, Alfarraj O, Deep V (2021) Principal component analysis, hidden markov model, and artificial neural network inspired techniques to recognize faces. Concurr Comput: Pract Exper 33 (9):6157
Article Google Scholar
Aggarwal A, Kumar M (2021) Image surface texture analysis and classification using deep learning. Multimed Tools Appl 80(1):1289–1309
Article Google Scholar
Agnese J, Herrera J, Tao H, Zhu X (2020) A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdiscip Rev: Data Min Knowl Discov 10(4):1345
Google Scholar
Agnese J, Herrera J, Tao H, Zhu X (2020) A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdiscip Rev: Data Min Knowl Discov 10(4):1345
Google Scholar
Banerjee S, Das S (2020) Sd-gan: Structural and denoising gan reveals facial parts under occlusion. arXiv:2002.08448
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems 29
Cheng J, Wu F, Tian Y, Wang L, Tao D (2020) Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 10911–10920
Dash A, Gamboa JCB, Ahmed S, Liwicki M, Afzal MZ (2017) Tac-gan-text conditioned auxiliary classifier generative adversarial network. arXiv:1703.06412
Ding M, Yang Z, Hong W, Zheng W, Zhou C, Yin D, Lin J, Zou X, Shao Z, Yang H et al (2021) Cogview: Mastering text-to-image generation via transformers. arXiv:2105.13290
Dolhansky B, Ferrer CC (2018) Eye in-painting with exemplar generative adversarial networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 7902–7911
Dong H, Yu S, Wu C, Guo Y (2017) Semantic image synthesis via adversarial learning. In: Proceedings of the IEEE International conference on computer vision, pp 5706–5714
Fu A, Hou Y (2017) Text-to-image generation using multi-instance stackgan
Gao L, Chen D, Zhao Z, Shao J, Shen HT (2021) Lightweight dynamic conditional gan with pyramid attention for text-to-image synthesis. Pattern Recogn 110:107384
Article Google Scholar
Garg K, Singh V, Tiwary US (2021) Textual description generation for visual content using neural networks. In: International Conference on intelligent human computer interaction, pp 16–26. Springer
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Advances in neural information processing systems 27
Gou Y, Wu Q, Li M, Gong B, Han M (2020) Segattngan:, Text to image generation with segmentation attention. arXiv:2005.12444
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30
Hinz T, Heinrich S, Wermter S (2019) Semantic object accuracy for generative text-to-image synthesis. arXiv:1910.13321
Hong S, Yang D, Choi J, Lee H (2018) Inferring semantic layout for hierarchical text-to-image synthesis. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 7986–7994
Huang H, Yu PS, Wang C (2018) An introduction to image synthesis with generative adversarial nets. arXiv:1803.04469
Kiros R, Salakhutdinov R, Zemel RS (2014) Unifying visual-semantic embeddings with multimodal neural language models. arXiv:1411.2539
Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language models. In: International Conference on Machine Learning, pp 595–603. PMLR
Kumar M, Aggarwal J, Rani A, Stephan T, Shankar A, Mirjalili S (2021) Secure video communication using firefly optimization and visual cryptography. Artificial Intelligence Review, pp 1–21
Lee S, Tariq S, Shin Y, Woo SS (2021) Detecting handcrafted facial image manipulations and gan-generated facial images using shallow-fakefacenet. Appl Soft Comput 105:107256
Article Google Scholar
Li B, Qi X, Lukasiewicz T, Torr P (2019) Controllable text-to-image generation. Advances in Neural Information Processing Systems 32
Li W, Zhang P, Zhang L, Huang Q, He X, Lyu S, Gao J (2019) Object-driven text-to-image synthesis via adversarial training. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 12174–12182
Mansimov E, Parisotto E, Ba JL, Salakhutdinov R (2015) Generating images from captions with attention. arXiv:1511.02793
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Mishra P, Rathore TS, Shivani S, Tendulkar S (2020) Text to image synthesis using residual gan. In: 2020 3rd International conference on emerging technologies in computer engineering: Machine learning and internet of things (ICETCE), pp. 139–144. IEEE
Nguyen A, Clune J, Bengio Y, Dosovitskiy A, Yosinski J (2017) Plug & play generative networks: Conditional iterative generation of images in latent space. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4467–4477
Nguyen A, Clune J, Bengio Y, Dosovitskiy A, Yosinski J (2017) Plug & play generative networks: Conditional iterative generation of images in latent space. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4467–4477
Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: International conference on machine learning, pp 2642–2651. PMLR
Pan Z, Yu W, Yi X, Khan A, Yuan F, Zheng Y (2019) Recent progress on generative adversarial networks (gans): a survey. IEEE Access 7:36322–36333
Article Google Scholar
Peng D, Yang W, Liu C, Lü S (2021) Sam-gan: Self-attention supporting multi-stage generative adversarial networks for text-to-image synthesis. Neural Netw 138:57–67
Article Google Scholar
Qiao T, Zhang J, Xu D, Tao D (2019) Mirrorgan: Learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 1505–1514
Ramesh A, Pavlov M, Goh G, Gray S, Voss C, Radford A, Chen M, Sutskever I (2021) Zero-shot text-to-image generation. arXiv:2102.12092
Reed S, Akata Z, Lee H, Schiele B (2016) Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 49–58
Reed SE, Akata Z, Mohan S, Tenka S, Schiele B, Lee H (2016) Learning what and where to draw. Adv Neural Inf Process Syst 29:217–225
Google Scholar
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: International conference on machine learning, pp 1060–1069. PMLR
Sah S, Peri D, Shringi A, Zhang C, Dominguez M, Savakis A, Ptucha R (2018) Semantically invariant text-to-image generation. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp 3783–3787. IEEE
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. Adv Neural Inf Process Syst 29:2234–2242
Google Scholar
Sun Q, Chang K-H, Dormer KJ, Dyer Jr RK, Gan RZ (2002) An advanced computer-aided geometric modeling and fabrication method for human middle ear. Med Eng Phys 24(9):595–606
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2818–2826
Tao M, Tang H, Wu S, Sebe N, Jing X-Y, Wu F, Bao B (2020) Df-gan: Deep fusion generative adversarial networks for text-to-image synthesis. arXiv:2008.05865
Valle R (2019) Hands-on generative adversarial networks with keras: Your guide to implementing next-generation generative adversarial networks. Packt Publishing Ltd???
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3156–3164
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset
Xia W, Yang Y, Xue J-H, Wu B (2021) Tedigan: Text-guided diverse face image generation and manipulation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 2256–2265
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057. PMLR
Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1316–1324
Ye H, Yang X, Takac M, Sunderraman R, Ji S (2021) Improving text-to-image synthesis using contrastive learning. arXiv:2107.02423
Yuan M, Peng Y (2018) Text-to-image synthesis via symmetrical distillation networks, pp 1407–1415
Zakraoui J, Saleh M, Al-Maadeed S, Jaam JM (2021) Improving text-to-image generation with object layout guidance. Multimedia Tools and Applications, pp 1–21
Zhang Y, Han S, Zhang Z, Wang J, Bi H (2022) Cf-gan: cross-domain feature fusion generative adversarial network for text-to-image synthesis. The Visual Computer, pp 1–11
Zhang H, Koh JY, Baldridge J, Lee H, Yang Y (2021) Cross-modal contrastive learning for text-to-image generation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 833–842
Zhang C, Peng Y (2018) Stacking vae and gan for context-aware text-to-image generation. In: 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM), pp 1–5. IEEE
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International conference on computer vision, pp 5907–5915
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2018) Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962
Article Google Scholar
Zhou P, Yu N, Wu Z, Davis LS, Shrivastava A, Lim S-N (2021) Deep video inpainting detection. arXiv:2101.11080
Zhu M, Pan P, Chen W, Yang Y (2019) Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 5802–5810

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

Department of Information Technology, Indian Institute of Information Technology, Allahabad, 211006, Uttar Pradesh, India
Varsha Singh & Uma Shanker Tiwary

Authors

Varsha Singh
View author publications
You can also search for this author inPubMed Google Scholar
Uma Shanker Tiwary
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Software, Writing - original draft preparation, Writing - review and editing: Varsha Singh; Validation: Varsha Singh and Uma Shanker Tiwary; Funding acquisition, Project administration, Resources and Supervision: Uma Shanker Tiwary.

Corresponding author

Correspondence to Varsha Singh.

Ethics declarations

Ethics approval

We confirm that the manuscript has been read and approved by both named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by both of us.

Conflict of interest/Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Singh, V., Tiwary, U.S. Visual content generation from textual description using improved adversarial network. Multimed Tools Appl 82, 10943–10960 (2023). https://doi.org/10.1007/s11042-022-13720-3

Download citation

Received: 03 March 2022
Revised: 01 August 2022
Accepted: 24 August 2022
Published: 15 September 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11042-022-13720-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual content generation from textual description using improved adversarial network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text to Image Synthesis Using Stacked Conditional Variational Autoencoders and Conditional Generative Adversarial Networks

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

Usage of Generative Adversarial Network to Improve Text to Image Synthesis

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Conflict of interest/Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now