Image Generation from Sketches and Text-Guided Attribute Edition

Sumiri Fernandez, Dennis Marcell; Ochoa-Luna, José

doi:10.1007/978-3-031-35445-8_7

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1837))

Included in the following conference series:

Annual International Conference on Information Management and Big Data

285 Accesses

Abstract

Every day new Artificial Intelligence models are created, and existing ones are modified or combined to extend their range of applications and tasks they can solve. This paper presents a novel approach that combines Natural Language Processing and generative networks to generate images from sketches and descriptions in natural language. We present a pipeline that was followed to recondition the generative network. It includes the processed text that will give the context to the sketch used for the generation. Finally, the model and the generated images are evaluated and compared using benchmark data sets, and promising results are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Stochastic Maximum Likelihood or Persistent Contrastive Divergence. Models that can learn a probability distribution over its set of inputs.
2.
Byte Pair Encoding which keeps the most frequent word intact while splitting the multiple ones into multiple tokens.
3.
Bidirectional Encoder Representations from Transformers is a transformer-based machine learning technique for natural language processing. It was developed by Google.
4.
https://www.midjourney.com/home/.
5.
https://midjourney.gitbook.io/docs/.

References

Bank, D., Koenigstein, N., Giryes, R.: Autoencoders. arXiv preprint arXiv:2003.05991 (2020)
Borji, A.: Generated faces in the wild: quantitative comparison of stable diffusion, midjourney and DALL-E 2. arXiv preprint arXiv:2210.00586 (2022)
Frans, K., Soros, L.B., Witkowski, O.: CLIPDraw: exploring text-to-drawing synthesis through language-image encoders. arXiv preprint arXiv:2106.14843 (2021)
Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 473–483 (1992)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. Advances in Neural Inf. Process. Syst. 27 (2014)
Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: DRAW: a recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471. PMLR (2015)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Kang, H., Lee, S., Chui, C.K.: Coherent line drawing. In: Proceedings of the 5th International Symposium on Non-photorealistic Animation and Rendering, pp. 43–50 (2007)
Google Scholar
Li, B., Qi, X., Lukasiewicz, T., Torr, P.: Controllable text-to-image generation. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Lu, Y., Wu, S., Tai, Y.W., Tang, C.K.: Image generation from sketch constraint using contextual GAN. In: Proceedings of the European conference on computer vision (ECCV), pp. 205–220 (2018)
Google Scholar
Mansimov, E., Parisotto, E., Ba, J.L., Salakhutdinov, R.: Generating images from captions with attention. arXiv preprint arXiv:1511.02793 (2015)
Provilkov, I., Emelianenko, D., Voita, E.: BPE-dropout: simple and effective subword regularization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1882–1892. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.170, https://aclanthology.org/2020.acl-main.170
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Ramesh, A., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, pp. 8821–8831. PMLR (2021)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2021)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. Adv. Neural Inf. Process. Syst. 29 (2016)
Google Scholar
Sarzynska-Wawer, J., et al.: Detecting formal thought disorder by deep contextualized word representations. Psychiatry Res. 304, 114135 (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Winnemöller, H., Kyprianidis, J.E., Olsen, S.C.: XDoG: an extended difference-of-gaussians compendium including advanced image stylization. Comput. Graph. 36(6), 740–753 (2012)
Article Google Scholar
Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Google Scholar
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Universidad Católica San Pablo, Arequipa, Peru
Dennis Marcell Sumiri Fernandez & José Ochoa-Luna

Authors

Dennis Marcell Sumiri Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
José Ochoa-Luna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Dennis Marcell Sumiri Fernandez or José Ochoa-Luna .

Editor information

Editors and Affiliations

National Institutes of Health, Bethesda, MD, USA
Juan Antonio Lossio-Ventura
Visibilia, Sao Carlos, Brazil
Jorge Valverde-Rebaza
Peruvian University of Applied Sciences, Lima, Peru
Eduardo Díaz
Pontifical Catholic University of Peru, Lima, Peru
Hugo Alatrista-Salas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sumiri Fernandez, D.M., Ochoa-Luna, J. (2023). Image Generation from Sketches and Text-Guided Attribute Edition. In: Lossio-Ventura, J.A., Valverde-Rebaza, J., Díaz, E., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2022. Communications in Computer and Information Science, vol 1837. Springer, Cham. https://doi.org/10.1007/978-3-031-35445-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-35445-8_7
Published: 11 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35444-1
Online ISBN: 978-3-031-35445-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Image Generation from Sketches and Text-Guided Attribute Edition