Skip to main content

Image Generation from Sketches and Text-Guided Attribute Edition

  • Conference paper
  • First Online:
Information Management and Big Data (SIMBig 2022)

Abstract

Every day new Artificial Intelligence models are created, and existing ones are modified or combined to extend their range of applications and tasks they can solve. This paper presents a novel approach that combines Natural Language Processing and generative networks to generate images from sketches and descriptions in natural language. We present a pipeline that was followed to recondition the generative network. It includes the processed text that will give the context to the sketch used for the generation. Finally, the model and the generated images are evaluated and compared using benchmark data sets, and promising results are reported.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Stochastic Maximum Likelihood or Persistent Contrastive Divergence. Models that can learn a probability distribution over its set of inputs.

  2. 2.

    Byte Pair Encoding which keeps the most frequent word intact while splitting the multiple ones into multiple tokens.

  3. 3.

    Bidirectional Encoder Representations from Transformers is a transformer-based machine learning technique for natural language processing. It was developed by Google.

  4. 4.

    https://www.midjourney.com/home/.

  5. 5.

    https://midjourney.gitbook.io/docs/.

References

  1. Bank, D., Koenigstein, N., Giryes, R.: Autoencoders. arXiv preprint arXiv:2003.05991 (2020)

  2. Borji, A.: Generated faces in the wild: quantitative comparison of stable diffusion, midjourney and DALL-E 2. arXiv preprint arXiv:2210.00586 (2022)

  3. Frans, K., Soros, L.B., Witkowski, O.: CLIPDraw: exploring text-to-drawing synthesis through language-image encoders. arXiv preprint arXiv:2106.14843 (2021)

  4. Geyer, C.J.: Practical Markov chain Monte Carlo. Stat. Sci. 473–483 (1992)

    Google Scholar 

  5. Goodfellow, I., et al.: Generative adversarial nets. Advances in Neural Inf. Process. Syst. 27 (2014)

    Google Scholar 

  6. Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: DRAW: a recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471. PMLR (2015)

    Google Scholar 

  7. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  8. Kang, H., Lee, S., Chui, C.K.: Coherent line drawing. In: Proceedings of the 5th International Symposium on Non-photorealistic Animation and Rendering, pp. 43–50 (2007)

    Google Scholar 

  9. Li, B., Qi, X., Lukasiewicz, T., Torr, P.: Controllable text-to-image generation. Adv. Neural Inf. Process. Syst. 32 (2019)

    Google Scholar 

  10. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  11. Lu, Y., Wu, S., Tai, Y.W., Tang, C.K.: Image generation from sketch constraint using contextual GAN. In: Proceedings of the European conference on computer vision (ECCV), pp. 205–220 (2018)

    Google Scholar 

  12. Mansimov, E., Parisotto, E., Ba, J.L., Salakhutdinov, R.: Generating images from captions with attention. arXiv preprint arXiv:1511.02793 (2015)

  13. Provilkov, I., Emelianenko, D., Voita, E.: BPE-dropout: simple and effective subword regularization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1882–1892. Association for Computational Linguistics, Online (2020). https://doi.org/10.18653/v1/2020.acl-main.170, https://aclanthology.org/2020.acl-main.170

  14. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)

  15. Ramesh, A., et al.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, pp. 8821–8831. PMLR (2021)

    Google Scholar 

  16. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)

    Google Scholar 

  17. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models (2021)

    Google Scholar 

  18. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  19. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. Adv. Neural Inf. Process. Syst. 29 (2016)

    Google Scholar 

  20. Sarzynska-Wawer, J., et al.: Detecting formal thought disorder by deep contextualized word representations. Psychiatry Res. 304, 114135 (2021)

    Google Scholar 

  21. Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  22. Winnemöller, H., Kyprianidis, J.E., Olsen, S.C.: XDoG: an extended difference-of-gaussians compendium including advanced image stylization. Comput. Graph. 36(6), 740–753 (2012)

    Article  Google Scholar 

  23. Xu, T., et al.: AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)

    Google Scholar 

  24. Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dennis Marcell Sumiri Fernandez or José Ochoa-Luna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sumiri Fernandez, D.M., Ochoa-Luna, J. (2023). Image Generation from Sketches and Text-Guided Attribute Edition. In: Lossio-Ventura, J.A., Valverde-Rebaza, J., Díaz, E., Alatrista-Salas, H. (eds) Information Management and Big Data. SIMBig 2022. Communications in Computer and Information Science, vol 1837. Springer, Cham. https://doi.org/10.1007/978-3-031-35445-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-35445-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-35444-1

  • Online ISBN: 978-3-031-35445-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics