Text-to-Image Synthesis with Threshold-Equipped Matching-Aware GAN

Shang, Jun; Yu, Wenxin; Che, Lu; Zhang, Zhiqiang; Cai, Hongjie; Deng, Zhiyu; Gong, Jun; Chen, Peng

doi:10.1007/978-981-99-8148-9_13

Jun Shang^10,13,
Wenxin Yu¹⁰,
Lu Che¹⁰,
Zhiqiang Zhang¹⁰,
Hongjie Cai¹⁰,
Zhiyu Deng¹⁰,
Jun Gong¹¹ &
…
Peng Chen¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1966))

Included in the following conference series:

International Conference on Neural Information Processing

441 Accesses

Abstract

In this paper, we propose a novel Equipped with Threshold Matching-Aware Generative Adversarial Network (ETMA-GAN) for text-to-image synthesis. By filtering inaccurate negative samples, the discriminator can more accurately determine whether the generator has generated the images correctly according to the descriptions. In addition, to enhance the discriminative model’s ability to discriminate and capture key semantic information, a word fine-grained supervisor is constructed, which in turn drives the generative model to achieve high-quality image detail synthesis. Numerous experiments and ablation studies on Caltech-UCSD Birds 200 (CUB) and Microsoft Common Objects in Context (MS COCO) datasets demonstrate the effectiveness and superiority of the proposed method over existing methods. In terms of subjective and objective evaluations, the model presented in this study has more advantages than the recently available state-of-the-art methods, especially regarding synthetic images with a higher degree of realism and better conformity to text descriptions.

This Research is Supported by National Key Research and Development Program from Ministry of Science and Technology of the PRC (No. 2018AAA0101801), (No. 2021ZD0110600), Sichuan Science and Technology Program (No. 2022ZYD0116), Sichuan Provincial M. C. Integration Office Program, And IEDA Laboratory Of SWUST.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goodfellow, I., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, NIPS 2014, vol. 2, pp. 2672–2680 (2014)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Li, B., Qi, X., Lukasiewicz, T., Torr, P.: Controllable text-to-image generation. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12174–12182 (2019)
Google Scholar
Liao, W., Hu, K., Yang, M.Y., Rosenhahn, B.: Text to image generation with semantic-spatial aware GAN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18187–18196 (2022)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, B., Song, K., Zhu, Y., de Melo, G., Elgammal, A.: Time: text and image mutual-translation adversarial networks. arXiv abs/2005.13192 (2020)
Google Scholar
Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Google Scholar
Ruan, S., et al.: DAE-GAN: dynamic aspect-aware GAN for text-to-image synthesis. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13940–13949 (2021). https://doi.org/10.1109/ICCV48922.2021.01370
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Tan, H., Liu, X., Li, X., Zhang, Y., Yin, B.: Semantics-enhanced adversarial nets for text-to-image synthesis. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10500–10509 (2019). https://doi.org/10.1109/ICCV.2019.01060
Tao, M., Tang, H., Wu, F., Jing, X., Bao, B.K., Xu, C.: DF-GAN: a simple and effective baseline for text-to-image synthesis. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16494–16504 (2022). https://doi.org/10.1109/CVPR52688.2022.01602
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)
Google Scholar
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Google Scholar
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Google Scholar
Zhang, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1947–1962 (2018)
Article Google Scholar
Zhu, M., Pan, P., Chen, W., Yang, Y.: DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5802–5810 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Southwest University of Science and Technology, Mianyang, Sichuan, China
Jun Shang, Wenxin Yu, Lu Che, Zhiqiang Zhang, Hongjie Cai & Zhiyu Deng
Southwest Automation Research Institute, Chengdu, China
Jun Gong
Chengdu Hongchengyun Technology Co. Ltd., Chengdu, China
Peng Chen
Instrumentation Technology and Economy Institute, Chengdu, People’s Republic of China
Jun Shang

Authors

Jun Shang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Lu Che
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongjie Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyu Deng
View author publications
You can also search for this author in PubMed Google Scholar
Jun Gong
View author publications
You can also search for this author in PubMed Google Scholar
Peng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxin Yu .

Editor information

Editors and Affiliations

School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shang, J. et al. (2024). Text-to-Image Synthesis with Threshold-Equipped Matching-Aware GAN. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1966. Springer, Singapore. https://doi.org/10.1007/978-981-99-8148-9_13

Download citation

DOI: https://doi.org/10.1007/978-981-99-8148-9_13
Published: 26 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8147-2
Online ISBN: 978-981-99-8148-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Text-to-Image Synthesis with Threshold-Equipped Matching-Aware GAN