Abstract
Image generation from natural language is one of the most exciting areas of image and language multimodal research in recent years. The Stacked Generative Adversarial Networks (StackGAN) model generates images from text; however, although this model has been successful in generating high-resolution images, it has some problems. Generated images can be unintelligible, and there are cases of mode collapse. Therefore, this study attempts to solve these two StackGAN problems and aims to generate more accurate images. We propose incorporating balanced consistency regularization (bCR) into StackGAN. The bCR method uses data augmentation to learn the meaning of data by making the identification results consistent. Additionally, bCR can stabilize learning in adversarial networks. Our experiments show that the Inception Score of StackGAN with bCR was 7% better than StackGAN alone. In addition, mode collapse was eliminated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goodfellow, I., et al.: Generative adversarial nets. Adv. Neural Inf. Process. Syst. 27 (2014)
Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: International Conference on Computer Vision (2017)
Zhao, Z., et al.: Improved consistency regularization for gans (2020). arXiv:2002.04724
Karras, T.: Progressive growing of GANs for Improved Quality, Stability, and Variation. In: International Conference on Learning Representations (2018)
Wah, C., et al.: The caltech-ucsd birds-200-2011 dataset (2011)
Salimans, T., et al.: Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 29 (2016)
Szegedy, C., et al.: Rethinking the inception architecture for computer vision. In: International Conference on Computer Vision and Pattern Recognition (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tominaga, R., Seo, M. (2023). Image Generation from Text Using StackGAN with Consistency Regularization. In: Omatu, S., Mehmood, R., Sitek, P., Cicerone, S., RodrÃguez, S. (eds) Distributed Computing and Artificial Intelligence, 19th International Conference. DCAI 2022. Lecture Notes in Networks and Systems, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-031-20859-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-20859-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20858-4
Online ISBN: 978-3-031-20859-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)