Abstract
In this paper, we propose a novel Equipped with Threshold Matching-Aware Generative Adversarial Network (ETMA-GAN) for text-to-image synthesis. By filtering inaccurate negative samples, the discriminator can more accurately determine whether the generator has generated the images correctly according to the descriptions. In addition, to enhance the discriminative model’s ability to discriminate and capture key semantic information, a word fine-grained supervisor is constructed, which in turn drives the generative model to achieve high-quality image detail synthesis. Numerous experiments and ablation studies on Caltech-UCSD Birds 200 (CUB) and Microsoft Common Objects in Context (MS COCO) datasets demonstrate the effectiveness and superiority of the proposed method over existing methods. In terms of subjective and objective evaluations, the model presented in this study has more advantages than the recently available state-of-the-art methods, especially regarding synthetic images with a higher degree of realism and better conformity to text descriptions.
This Research is Supported by National Key Research and Development Program from Ministry of Science and Technology of the PRC (No. 2018AAA0101801), (No. 2021ZD0110600), Sichuan Science and Technology Program (No. 2022ZYD0116), Sichuan Provincial M. C. Integration Office Program, And IEDA Laboratory Of SWUST.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goodfellow, I., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS 2014, vol. 2, pp. 2672–2680 (2014)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Li, B., Qi, X., Lukasiewicz, T., Torr, P.: Controllable text-to-image generation. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12174–12182 (2019)
Liao, W., Hu, K., Yang, M.Y., Rosenhahn, B.: Text to image generation with semantic-spatial aware GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18187–18196 (2022)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, B., Song, K., Zhu, Y., de Melo, G., Elgammal, A.: Time: text and image mutual-translation adversarial networks. arXiv abs/2005.13192 (2020)
Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Ruan, S., et al.: DAE-GAN: dynamic aspect-aware GAN for text-to-image synthesis. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13940–13949 (2021). https://doi.org/10.1109/ICCV48922.2021.01370
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Tan, H., Liu, X., Li, X., Zhang, Y., Yin, B.: Semantics-enhanced adversarial nets for text-to-image synthesis. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10500–10509 (2019). https://doi.org/10.1109/ICCV.2019.01060
Tao, M., Tang, H., Wu, F., Jing, X., Bao, B.K., Xu, C.: DF-GAN: a simple and effective baseline for text-to-image synthesis. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16494–16504 (2022). https://doi.org/10.1109/CVPR52688.2022.01602
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Zhang, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2018)
Zhu, M., Pan, P., Chen, W., Yang, Y.: DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5810 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shang, J. et al. (2024). Text-to-Image Synthesis with Threshold-Equipped Matching-Aware GAN. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1966. Springer, Singapore. https://doi.org/10.1007/978-981-99-8148-9_13
Download citation
DOI: https://doi.org/10.1007/978-981-99-8148-9_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8147-2
Online ISBN: 978-981-99-8148-9
eBook Packages: Computer ScienceComputer Science (R0)