Abstract
Deep generative models have caused quite a stir due to their excellent performance in generating original images from different realms of the real world. An example of the application of these models is style transfer, where the style of one object is transferred to the content of another. In this study, an innovative proposal is made for transferring the multimodal style of songs to album covers, which consists of a pipeline structured in three parts. First, it is proposed to train a multimodal latent space from a triplet network model that receives a dataset of cover images and songs represented as spectrograms, around 18 genres. Then, with this latent space, the knn algorithm is computed, and the closest cover art to a query song is obtained. Finally, fine-tuning is performed on a pretrained Spectral Normalized GAN model on ImageNet, training only the batch parameters to avoid overfitting. And later, the original cover art is sampled. This way, the pipeline is executed for songs of 10 different genres, obtaining covers of similar genres in the 100 closest neighbors and obtaining images with an average Frechet Inception Distance of 20.89.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Bugueño, M., Mendoza, M.: Learning to detect online harassment on twitter with the transformer. In: Cellier, P., Driessens, K. (eds.) ECML PKDD 2019. CCIS, vol. 1168, pp. 298–306. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43887-6_23
Castillo, S., et al.: Detection of bots and cyborgs in twitter: a study on the Chilean presidential election in 2017. In: Meiselwitz, G. (ed.) HCII 2019. LNCS, vol. 11578, pp. 311–323. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21902-4_22
Mendoza, M.: A new term-weighting scheme for naïve Bayes text categorization. Int. J. Web Inf. Syst. 8(1), 55–72 (2012)
Aghajanyan, A., Shrivastava, A., Gupta, A., Goyal, N., Zettlemoyer, L., Gupta, S.: Better fine-tuning by reducing representational collapse. ICLR 2021 (2021)
Paranjape, B., Michael, J., Ghazvininejad, M., Hajishirzi, H., Zettlemoyer, L.: Prompting contrastive explanations for commonsense reasoning tasks. In: ACL/IJCNLP (Findings), 4179–4192 (2021)
Tampe, I., Mendoza, M., Milios, E.: Neural Abstractive Unsupervised Summarization of Online News Discussions. IntelliSys (2), pp. 822–841 (2021)
Mendoza, M., Tesconi, M., Cresci, S.: Bots in social and interaction networks: detection and impact estimation. ACM Trans. Inf. Syst. 39(1): 5:1–5:32 (2020)
Ulloa, G., Veloz, A., Allende-Cid, H., Monge, R., Allende, H.: Efficient methodology based on convolutional neural networks with augmented penalization on hard-to-classify boundary voxels on the task of brain lesion segmentation. In: MCPR 2022, pp. 338–347 (2022)
Molina, G., et al.: A new content-based image retrieval system for SARS-CoV-2 computer-aided diagnosis. In: Su, R., Zhang, Y.-D., Liu, H. (eds.) MICAD 2021. LNEE, vol. 784, pp. 316–324. Springer, Singapore (2022). https://doi.org/10.1007/978-981-16-3880-0_33
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML 2021, pp. 8748–8763 (2021)
Ramesh, A., et al.: Zero-shot text-to-image generation. In: ICML 2021, pp. 8821–8831 (2021)
Mery, D., Filbert, D.: Automated flaw detection in aluminum castings based on the tracking of potential defects in a radioscopic image sequence. IEEE Trans. Robotics Autom. 18(6), 890–901 (2002)
Saavedra, D., Banerjee, S., Mery, D.: Detection of threat objects in baggage inspection with X-ray images using deep learning. Neural Comput. Appl. 33(13), 7803–7819 (2021)
Schroff, F., Kalenichenko,, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR 2015, pp. 815–823 (2015)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)
Kingma, D.P., Welling, M.: An introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019)
Ian, J., et al.: Generative Adversarial Nets. NIPS 2014, pp. 2672–2680 (2014)
Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV 2017, pp. 2242–2251 (2017)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (Poster) (2016)
Jiang, Y., et al.: SimGAN: hybrid simulator identification for domain adaptation via adversarial reinforcement learning. In: ICRA 2021, pp. 2884–2890 (2021)
Gatys, L.A., Ecker, A.S., Bethge, M. Hertzmann, A., Shechtman, E.D.: Controlling Perceptual Factors in Neural Style Transfer. CVPR 2017, pp. 3730–3738 (2017)
Lindborg, P.M., Friberg, A.: Colour association with music is mediated by emotion: evidence from an experiment using a CIE lab interface and interviews. PLoS ONE 10(12), e0144013 (2015)
Whiteford, K., Schloss, K., Helwig, N., Palmer, S.: Color, music, and emotion: bach to the blues. I-Perception 9(6) (2018)
Acknowledgements
The authors acknowledge funding support from the Millennium Institute for Foundational Research on Data (IMFD ANID - Millennium Science Initiative Program - Code ICN17_002) and the National Center of Artificial Intelligence (CENIA FB210017, Basal ANID). Marcelo Mendoza was funded by the National Agency of Research and Development (ANID) grant FONDECYT 1200211. The founders played no role in the design of this study.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Apolo, M.J., Mendoza, M. (2023). Bimodal Style Transference from Musical Composition to Image Using Deep Generative Models. In: Rauterberg, M. (eds) Culture and Computing. HCII 2023. Lecture Notes in Computer Science, vol 14035. Springer, Cham. https://doi.org/10.1007/978-3-031-34732-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-34732-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34731-3
Online ISBN: 978-3-031-34732-0
eBook Packages: Computer ScienceComputer Science (R0)