Skip to main content

Bimodal Neural Style Transfer for Image Generation Based on Text Prompts

  • Conference paper
  • First Online:
Culture and Computing (HCII 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14035))

Included in the following conference series:

  • 838 Accesses

Abstract

Neural networks have become one of the essential areas in Artificial Intelligence due to their extraordinary capacity to address problems in different domains. This ability led to the proposal of novel architectures and models to tackle challenging tasks such as neural style transfer. We propose a novel methodology for bimodal style transfer using text as input. We initially retrieve one image and a short descriptive text, which are mapped into a multimodal common latent space. Then, a new image is retrieved using an image retrieval engine. Finally, we use a generative model, which allows us to create artistic images by combining content and style. The proposed system can retrieve semantically similar images concerning a descriptive text (prompt), achieving great precision rates in image retrieval applied to the SemArt dataset. The transfer style neural model also preserves the image’s high quality, combining style and content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  2. Bugueño, M., Mendoza, M.: Learning to detect online harassment on twitter with the transformer. PKDD/ECML Workshops (2), 298–306 (2019)

    Google Scholar 

  3. Castillo, S., et al.: Detection of bots and cyborgs in twitter: a study on the chilean presidential election in 2017. HCI (13), 311–323 (2019)

    Google Scholar 

  4. Mendoza, M.: A new term-weighting scheme for naïve Bayes text categorization. Int. J. Web Inf. Syst. 8(1), 55–72 (2012)

    Article  Google Scholar 

  5. Aghajanyan, A., Shrivastava, A., Gupta, A., Goyal, N.: Luke Zettlemoyer. Better Fine-Tuning by Reducing Representational Collapse. ICLR, Sonal Gupta (2021)

    Google Scholar 

  6. Paranjape, B., Michael, J., Ghazvininejad, M., Hajishirzi, H., Zettlemoyer, L.: Prompting contrastive explanations for commonsense reasoning tasks. ACL/IJCNLP (Findings), 4179–4192 (2021)

    Google Scholar 

  7. Tampe, I., Mendoza, M., Milios, E.: Neural abstractive unsupervised summarization of online news discussions. IntelliSys (2), 822–841 (2021)

    Google Scholar 

  8. Mendoza, M., Tesconi, M., Cresci, S.: Bots in social and interaction networks: detection and impact estimation. ACM Trans. Inf. Syst. 39(1), 5:1–5:32 (2020)

    Google Scholar 

  9. Ulloa, G., Veloz, A., Allende-Cid, H., Monge, R., Allende, H.: Efficient methodology based on convolutional neural networks with augmented penalization on hard-to-classify boundary voxels on the task of brain lesion segmentation. MCPR, 338–347 (2022)

    Google Scholar 

  10. Molina, G., et al.: A new content-based image retrieval system for SARS-CoV-2 computer-aided diagnosis. MICAD, 316–324 (2021)

    Google Scholar 

  11. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763 (2021)

    Google Scholar 

  12. Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A.: Mark Chen, pp. 8821–8831. Zero-Shot Text-to-Image Generation. ICML, Ilya Sutskever (2021)

    Google Scholar 

  13. Mery, D., Filbert, D.: Automated flaw detection in aluminum castings based on the tracking of potential defects in a radioscopic image sequence. IEEE Trans. Robot. Autom. 18(6), 890–901 (2002)

    Article  Google Scholar 

  14. Saavedra, D., Banerjee, S., Mery, D.: Detection of threat objects in baggage inspection with X-ray images using deep learning. Neural Comput. Appl. 33(13), 7803–7819 (2021)

    Article  Google Scholar 

  15. Duan, Y., Andrychowicz, M., Stadie, B.C., Ho, J., Schneider, J., Sutskever, I.: Pieter Abbeel, pp. 1087–1098. One-Shot Imitation Learning. NIPS, Wojciech Zaremba (2017)

    Google Scholar 

  16. Nichol, A.Q., et al.: GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In: ICML, pp. 16784–16804 (2022)

    Google Scholar 

  17. Diederik, P.: Kingma, max welling: an introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019)

    Article  Google Scholar 

  18. Ian, J. et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)

    Google Scholar 

  19. Zhu, J.-Y., Park, T., Isola, P., Alexei A.: EFROS: unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2242–2251 (2017)

    Google Scholar 

  20. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (Poster) (2016)

    Google Scholar 

  21. Jiang, Y., et al.: SimGAN: hybrid simulator identification for domain adaptation via adversarial reinforcement learning. In: ICRA, pp. 2884–2890 (2021)

    Google Scholar 

  22. Gatys, L.A., Ecker, A.S., Bethge, M., Hertzmann, A., Shechtman, E.: Controlling perceptual factors in neural style transfer. In: CVPR, pp. 3730–3738 (2017)

    Google Scholar 

  23. Jin, D., Jin, Z., Zhiting, H., Vechtomova, O., Mihalcea, R.: Deep learning for text style transfer: a survey. Comput. Linguist. 48(1), 155–205 (2022)

    Article  Google Scholar 

  24. Garcia, N., Vogiatzis, G.: How to read paintings: semantic art understanding with multi-modal retrieval. ECCV Workshops (2), 676–691 (2018)

    Google Scholar 

  25. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)

    Google Scholar 

Download references

Acknowledgements

The authors acknowledge funding support from the Millennium Institute for Foundational Research on Data (IMFD ANID - Millennium Science Initiative Program - Code ICN17_002) and the National Center of Artificial Intelligence (CENIA FB210017, Basal ANID). Marcelo Mendoza was funded by the National Agency of Research and Development (ANID) grant FONDECYT 1200211. The founders played no role in the design of this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcelo Mendoza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gutiérrez, D., Mendoza, M. (2023). Bimodal Neural Style Transfer for Image Generation Based on Text Prompts. In: Rauterberg, M. (eds) Culture and Computing. HCII 2023. Lecture Notes in Computer Science, vol 14035. Springer, Cham. https://doi.org/10.1007/978-3-031-34732-0_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34732-0_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34731-3

  • Online ISBN: 978-3-031-34732-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics