Skip to main content

Bimodal Style Transference from Musical Composition to Image Using Deep Generative Models

  • Conference paper
  • First Online:
Culture and Computing (HCII 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14035))

Included in the following conference series:

  • 803 Accesses

Abstract

Deep generative models have caused quite a stir due to their excellent performance in generating original images from different realms of the real world. An example of the application of these models is style transfer, where the style of one object is transferred to the content of another. In this study, an innovative proposal is made for transferring the multimodal style of songs to album covers, which consists of a pipeline structured in three parts. First, it is proposed to train a multimodal latent space from a triplet network model that receives a dataset of cover images and songs represented as spectrograms, around 18 genres. Then, with this latent space, the knn algorithm is computed, and the closest cover art to a query song is obtained. Finally, fine-tuning is performed on a pretrained Spectral Normalized GAN model on ImageNet, training only the batch parameters to avoid overfitting. And later, the original cover art is sampled. This way, the pipeline is executed for songs of 10 different genres, obtaining covers of similar genres in the 100 closest neighbors and obtaining images with an average Frechet Inception Distance of 20.89.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://faiss.ai/.

  2. 2.

    https://developer.spotify.com/documentation/web-api/.

  3. 3.

    https://github.com/spotDL/spotify-downloader.

References

  1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  2. Bugueño, M., Mendoza, M.: Learning to detect online harassment on twitter with the transformer. In: Cellier, P., Driessens, K. (eds.) ECML PKDD 2019. CCIS, vol. 1168, pp. 298–306. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43887-6_23

    Chapter  Google Scholar 

  3. Castillo, S., et al.: Detection of bots and cyborgs in twitter: a study on the Chilean presidential election in 2017. In: Meiselwitz, G. (ed.) HCII 2019. LNCS, vol. 11578, pp. 311–323. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21902-4_22

    Chapter  Google Scholar 

  4. Mendoza, M.: A new term-weighting scheme for naïve Bayes text categorization. Int. J. Web Inf. Syst. 8(1), 55–72 (2012)

    Article  Google Scholar 

  5. Aghajanyan, A., Shrivastava, A., Gupta, A., Goyal, N., Zettlemoyer, L., Gupta, S.: Better fine-tuning by reducing representational collapse. ICLR 2021 (2021)

    Google Scholar 

  6. Paranjape, B., Michael, J., Ghazvininejad, M., Hajishirzi, H., Zettlemoyer, L.: Prompting contrastive explanations for commonsense reasoning tasks. In: ACL/IJCNLP (Findings), 4179–4192 (2021)

    Google Scholar 

  7. Tampe, I., Mendoza, M., Milios, E.: Neural Abstractive Unsupervised Summarization of Online News Discussions. IntelliSys (2), pp. 822–841 (2021)

    Google Scholar 

  8. Mendoza, M., Tesconi, M., Cresci, S.: Bots in social and interaction networks: detection and impact estimation. ACM Trans. Inf. Syst. 39(1): 5:1–5:32 (2020)

    Google Scholar 

  9. Ulloa, G., Veloz, A., Allende-Cid, H., Monge, R., Allende, H.: Efficient methodology based on convolutional neural networks with augmented penalization on hard-to-classify boundary voxels on the task of brain lesion segmentation. In: MCPR 2022, pp. 338–347 (2022)

    Google Scholar 

  10. Molina, G., et al.: A new content-based image retrieval system for SARS-CoV-2 computer-aided diagnosis. In: Su, R., Zhang, Y.-D., Liu, H. (eds.) MICAD 2021. LNEE, vol. 784, pp. 316–324. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-3880-0_33

    Chapter  Google Scholar 

  11. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML 2021, pp. 8748–8763 (2021)

    Google Scholar 

  12. Ramesh, A., et al.: Zero-shot text-to-image generation. In: ICML 2021, pp. 8821–8831 (2021)

    Google Scholar 

  13. Mery, D., Filbert, D.: Automated flaw detection in aluminum castings based on the tracking of potential defects in a radioscopic image sequence. IEEE Trans. Robotics Autom. 18(6), 890–901 (2002)

    Article  Google Scholar 

  14. Saavedra, D., Banerjee, S., Mery, D.: Detection of threat objects in baggage inspection with X-ray images using deep learning. Neural Comput. Appl. 33(13), 7803–7819 (2021)

    Article  Google Scholar 

  15. Schroff, F., Kalenichenko,, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR 2015, pp. 815–823 (2015)

    Google Scholar 

  16. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)

    Google Scholar 

  17. Kingma, D.P., Welling, M.: An introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019)

    Google Scholar 

  18. Ian, J., et al.: Generative Adversarial Nets. NIPS 2014, pp. 2672–2680 (2014)

    Google Scholar 

  19. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV 2017, pp. 2242–2251 (2017)

    Google Scholar 

  20. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (Poster) (2016)

    Google Scholar 

  21. Jiang, Y., et al.: SimGAN: hybrid simulator identification for domain adaptation via adversarial reinforcement learning. In: ICRA 2021, pp. 2884–2890 (2021)

    Google Scholar 

  22. Gatys, L.A., Ecker, A.S., Bethge, M. Hertzmann, A., Shechtman, E.D.: Controlling Perceptual Factors in Neural Style Transfer. CVPR 2017, pp. 3730–3738 (2017)

    Google Scholar 

  23. Lindborg, P.M., Friberg, A.: Colour association with music is mediated by emotion: evidence from an experiment using a CIE lab interface and interviews. PLoS ONE 10(12), e0144013 (2015)

    Article  Google Scholar 

  24. Whiteford, K., Schloss, K., Helwig, N., Palmer, S.: Color, music, and emotion: bach to the blues. I-Perception 9(6) (2018)

    Google Scholar 

Download references

Acknowledgements

The authors acknowledge funding support from the Millennium Institute for Foundational Research on Data (IMFD ANID - Millennium Science Initiative Program - Code ICN17_002) and the National Center of Artificial Intelligence (CENIA FB210017, Basal ANID). Marcelo Mendoza was funded by the National Agency of Research and Development (ANID) grant FONDECYT 1200211. The founders played no role in the design of this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcelo Mendoza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Apolo, M.J., Mendoza, M. (2023). Bimodal Style Transference from Musical Composition to Image Using Deep Generative Models. In: Rauterberg, M. (eds) Culture and Computing. HCII 2023. Lecture Notes in Computer Science, vol 14035. Springer, Cham. https://doi.org/10.1007/978-3-031-34732-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34732-0_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34731-3

  • Online ISBN: 978-3-031-34732-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics