Text to Image Synthesis Based on Multiple Discrimination

Zhang, Zhiqiang; Zhang, Yunye; Yu, Wenxin; Lu, Jingwei; Nie, Li; He, Gang; Jiang, Ning; He, Gang; Fan, Yibo; Yang, Zhuo

doi:10.1007/978-3-030-30508-6_46

Zhiqiang Zhang ORCID: orcid.org/0000-0002-2408-366X¹²,
Yunye Zhang¹²,
Wenxin Yu ORCID: orcid.org/0000-0002-6093-5516¹²,
Jingwei Lu¹³,
Li Nie¹²,
Gang He¹²,
Ning Jiang¹²,
Gang He¹⁴,
Yibo Fan¹⁵ &
…
Zhuo Yang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11729))

Included in the following conference series:

International Conference on Artificial Neural Networks

2637 Accesses
2 Citations

Abstract

We propose a novel and simple text-to-image synthesizer (MD-GAN) using multiple discrimination. Based on the Generative Adversarial Network (GAN), we introduce segmentation images to the discriminator to ensure the improvement of discrimination ability. The improvement of discrimination ability will enhance the generator’s generating ability, thus obtaining high-resolution results. Experiments well validate the outstanding performance of our algorithm. On CUB dataset, our inception score is 27.7% and 1.7% higher than GAN-CLS-INT and GAWWN, respectively. On the flower dataset, it further outplays GAN-CLS-INT and StackGAN by 21.8% and 1.25%, respectively. At the same time, our model is more concise in structure, and its training time is only half that of StackGAN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goodfellow, I.J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiels, B., Lee, H.: Generative adversarial text-to-image synthesis. In: International Conference on Machine Learning, pp. 1060–1069 (2016)
Google Scholar
Reed, S., Akata, Z., Yan, X., Longwaran, L., Schiels, B., Lee, H.: Learning what and where to draw. In: Advances in Neural Information Processing Systems, pp. 217–225 (2016)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., Metaxas, D.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: International Conference on Computer Vision, pp. 5908–5916 (2017). https://doi.org/10.1109/iccv.2017.629
Reed, S.E., Akata, Z., Lee, H., Schiele, B.: Learning deep representations of fine-grained visual descriptions. In: Computer Vision and Pattern Recognition, pp. 49–58 (2016). https://doi.org/10.1109/CVPR.2016.13
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: International Conference on Computer Vision, pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.322
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Computer Vision on Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Lin, Y., Dollár, P., Hariharan, R.B., Belongie, S.J.: Feature pyramid networks for object detection. In: Computer Vision and Pattern Recognition, pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106
Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: Indian Conference on Computer Vision, Graphics and Image Processing (2008). https://doi.org/10.1109/ICVGIP.2008.47
Reed, S.E., Sohn, K., Zhang, Y., Lee, H.: Learning to disentangle factors of variation with manifold interaction. In: International Conference on Machine Learning, pp. 1431–1439 (2014)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. In: abs/1505.00853 (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Computer Vision and Pattern Recognition, pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2226–2234 (2016)
Google Scholar

Download references

Acknowledgement

This research was supported by 2018GZ0517, 2019YFS0146, 2019YFS0155 which supported by Sichuan Provincial Science and Technology Department, 2018KF003 Supported by State Key Laboratory of ASIC & System, Science and Technology Planning Project of Guangdong Province 2017B010110007.

Author information

Authors and Affiliations

Southwest University of Science and Technology, Mianyang, China
Zhiqiang Zhang, Yunye Zhang, Wenxin Yu, Li Nie, Gang He & Ning Jiang
Cadence Design Systems, Inc., San Jose, USA
Jingwei Lu
Xidian University, Xi’an, China
Gang He
State Key Laboratory of ASIC and System, Fudan University, Shanghai, China
Yibo Fan
Guangdong University of Technology, Guangzhou, China
Zhuo Yang

Authors

Zhiqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yunye Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jingwei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Li Nie
View author publications
You can also search for this author in PubMed Google Scholar
Gang He
View author publications
You can also search for this author in PubMed Google Scholar
Ning Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Gang He
View author publications
You can also search for this author in PubMed Google Scholar
Yibo Fan
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenxin Yu .

Editor information

Editors and Affiliations

Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Igor V. Tetko
Institute of Computer Science, Czech Academy of Sciences, Praha 8, Czech Republic
Věra Kůrková
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Pavel Karpov
Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), Neuherberg, Germany
Fabian Theis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z. et al. (2019). Text to Image Synthesis Based on Multiple Discrimination. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing. ICANN 2019. Lecture Notes in Computer Science(), vol 11729. Springer, Cham. https://doi.org/10.1007/978-3-030-30508-6_46

Download citation

DOI: https://doi.org/10.1007/978-3-030-30508-6_46
Published: 09 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30507-9
Online ISBN: 978-3-030-30508-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics