Bimodal Style Transference from Musical Composition to Image Using Deep Generative Models

Apolo, María José; Mendoza, Marcelo

doi:10.1007/978-3-031-34732-0_17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14035))

Included in the following conference series:

International Conference on Human-Computer Interaction

803 Accesses

Abstract

Deep generative models have caused quite a stir due to their excellent performance in generating original images from different realms of the real world. An example of the application of these models is style transfer, where the style of one object is transferred to the content of another. In this study, an innovative proposal is made for transferring the multimodal style of songs to album covers, which consists of a pipeline structured in three parts. First, it is proposed to train a multimodal latent space from a triplet network model that receives a dataset of cover images and songs represented as spectrograms, around 18 genres. Then, with this latent space, the knn algorithm is computed, and the closest cover art to a query song is obtained. Finally, fine-tuning is performed on a pretrained Spectral Normalized GAN model on ImageNet, training only the batch parameters to avoid overfitting. And later, the original cover art is sampled. This way, the pipeline is executed for songs of 10 different genres, obtaining covers of similar genres in the 100 closest neighbors and obtaining images with an average Frechet Inception Distance of 20.89.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Bugueño, M., Mendoza, M.: Learning to detect online harassment on twitter with the transformer. In: Cellier, P., Driessens, K. (eds.) ECML PKDD 2019. CCIS, vol. 1168, pp. 298–306. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43887-6_23
Chapter Google Scholar
Castillo, S., et al.: Detection of bots and cyborgs in twitter: a study on the Chilean presidential election in 2017. In: Meiselwitz, G. (ed.) HCII 2019. LNCS, vol. 11578, pp. 311–323. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21902-4_22
Chapter Google Scholar
Mendoza, M.: A new term-weighting scheme for naïve Bayes text categorization. Int. J. Web Inf. Syst. 8(1), 55–72 (2012)
Article Google Scholar
Aghajanyan, A., Shrivastava, A., Gupta, A., Goyal, N., Zettlemoyer, L., Gupta, S.: Better fine-tuning by reducing representational collapse. ICLR 2021 (2021)
Google Scholar
Paranjape, B., Michael, J., Ghazvininejad, M., Hajishirzi, H., Zettlemoyer, L.: Prompting contrastive explanations for commonsense reasoning tasks. In: ACL/IJCNLP (Findings), 4179–4192 (2021)
Google Scholar
Tampe, I., Mendoza, M., Milios, E.: Neural Abstractive Unsupervised Summarization of Online News Discussions. IntelliSys (2), pp. 822–841 (2021)
Google Scholar
Mendoza, M., Tesconi, M., Cresci, S.: Bots in social and interaction networks: detection and impact estimation. ACM Trans. Inf. Syst. 39(1): 5:1–5:32 (2020)
Google Scholar
Ulloa, G., Veloz, A., Allende-Cid, H., Monge, R., Allende, H.: Efficient methodology based on convolutional neural networks with augmented penalization on hard-to-classify boundary voxels on the task of brain lesion segmentation. In: MCPR 2022, pp. 338–347 (2022)
Google Scholar
Molina, G., et al.: A new content-based image retrieval system for SARS-CoV-2 computer-aided diagnosis. In: Su, R., Zhang, Y.-D., Liu, H. (eds.) MICAD 2021. LNEE, vol. 784, pp. 316–324. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-3880-0_33
Chapter Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML 2021, pp. 8748–8763 (2021)
Google Scholar
Ramesh, A., et al.: Zero-shot text-to-image generation. In: ICML 2021, pp. 8821–8831 (2021)
Google Scholar
Mery, D., Filbert, D.: Automated flaw detection in aluminum castings based on the tracking of potential defects in a radioscopic image sequence. IEEE Trans. Robotics Autom. 18(6), 890–901 (2002)
Article Google Scholar
Saavedra, D., Banerjee, S., Mery, D.: Detection of threat objects in baggage inspection with X-ray images using deep learning. Neural Comput. Appl. 33(13), 7803–7819 (2021)
Article Google Scholar
Schroff, F., Kalenichenko,, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR 2015, pp. 815–823 (2015)
Google Scholar
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)
Google Scholar
Kingma, D.P., Welling, M.: An introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019)
Google Scholar
Ian, J., et al.: Generative Adversarial Nets. NIPS 2014, pp. 2672–2680 (2014)
Google Scholar
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV 2017, pp. 2242–2251 (2017)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (Poster) (2016)
Google Scholar
Jiang, Y., et al.: SimGAN: hybrid simulator identification for domain adaptation via adversarial reinforcement learning. In: ICRA 2021, pp. 2884–2890 (2021)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M. Hertzmann, A., Shechtman, E.D.: Controlling Perceptual Factors in Neural Style Transfer. CVPR 2017, pp. 3730–3738 (2017)
Google Scholar
Lindborg, P.M., Friberg, A.: Colour association with music is mediated by emotion: evidence from an experiment using a CIE lab interface and interviews. PLoS ONE 10(12), e0144013 (2015)
Article Google Scholar
Whiteford, K., Schloss, K., Helwig, N., Palmer, S.: Color, music, and emotion: bach to the blues. I-Perception 9(6) (2018)
Google Scholar

Download references

Acknowledgements

The authors acknowledge funding support from the Millennium Institute for Foundational Research on Data (IMFD ANID - Millennium Science Initiative Program - Code ICN17_002) and the National Center of Artificial Intelligence (CENIA FB210017, Basal ANID). Marcelo Mendoza was funded by the National Agency of Research and Development (ANID) grant FONDECYT 1200211. The founders played no role in the design of this study.

Author information

Authors and Affiliations

Department of Informatics, Universidad Técnica Federico Santa María, Av. Vicuña Mackenna 3939, Santiago, Chile
María José Apolo
Department of Computer Science, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna 6840, Santiago, Chile
Marcelo Mendoza

Authors

María José Apolo
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcelo Mendoza .

Editor information

Editors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Matthias Rauterberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Apolo, M.J., Mendoza, M. (2023). Bimodal Style Transference from Musical Composition to Image Using Deep Generative Models. In: Rauterberg, M. (eds) Culture and Computing. HCII 2023. Lecture Notes in Computer Science, vol 14035. Springer, Cham. https://doi.org/10.1007/978-3-031-34732-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-34732-0_17
Published: 09 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34731-3
Online ISBN: 978-3-031-34732-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bimodal Style Transference from Musical Composition to Image Using Deep Generative Models