Image Generation from Text Using StackGAN with Consistency Regularization

Tominaga, Rihito; Seo, Masataka

doi:10.1007/978-3-031-20859-1_9

Rihito Tominaga¹⁴ &
Masataka Seo¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 583))

Included in the following conference series:

International Symposium on Distributed Computing and Artificial Intelligence

Abstract

Image generation from natural language is one of the most exciting areas of image and language multimodal research in recent years. The Stacked Generative Adversarial Networks (StackGAN) model generates images from text; however, although this model has been successful in generating high-resolution images, it has some problems. Generated images can be unintelligible, and there are cases of mode collapse. Therefore, this study attempts to solve these two StackGAN problems and aims to generate more accurate images. We propose incorporating balanced consistency regularization (bCR) into StackGAN. The bCR method uses data augmentation to learn the meaning of data by making the identification results consistent. Additionally, bCR can stabilize learning in adversarial networks. Our experiments show that the Inception Score of StackGAN with bCR was 7% better than StackGAN alone. In addition, mode collapse was eliminated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TRGAN: Text to Image Generation Through Optimizing Initial Image

COMIM-GAN: Improved Text-to-Image Generation via Condition Optimization and Mutual Information Maximization

A Review on Generative Adversarial Networks

References

Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)
Google Scholar
Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: International Conference on Computer Vision (2017)
Google Scholar
Zhao, Z., et al.: Improved consistency regularization for gans (2020). arXiv:2002.04724
Karras, T.: Progressive growing of GANs for Improved Quality, Stability, and Variation. In: International Conference on Learning Representations (2018)
Google Scholar
Wah, C., et al.: The caltech-ucsd birds-200-2011 dataset (2011)
Google Scholar
Salimans, T., et al.: Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 29 (2016)
Google Scholar
Szegedy, C., et al.: Rethinking the inception architecture for computer vision. In: International Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Osaka Institute of Technology, 1-45 Chayamachi, Kita-ku, Osaka, Japan
Rihito Tominaga & Masataka Seo

Authors

Rihito Tominaga
View author publications
You can also search for this author in PubMed Google Scholar
Masataka Seo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rihito Tominaga .

Editor information

Editors and Affiliations

Hiroshima University, Hiroshima, Japan
Sigeru Omatu
King Abdulaziz University, Jeddah, Saudi Arabia
Rashid Mehmood
Kielce University of Technology, Kielce, Poland
Pawel Sitek
Palazzo Camponeschi, University of L'Aquila, L'Aquila, Italy
Serafino Cicerone
BISITE, Edificio I+D+i, University of Salamanca, Salamanca, Spain
Sara Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tominaga, R., Seo, M. (2023). Image Generation from Text Using StackGAN with Consistency Regularization. In: Omatu, S., Mehmood, R., Sitek, P., Cicerone, S., Rodríguez, S. (eds) Distributed Computing and Artificial Intelligence, 19th International Conference. DCAI 2022. Lecture Notes in Networks and Systems, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-031-20859-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-20859-1_9
Published: 13 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20858-4
Online ISBN: 978-3-031-20859-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics