Abstract
In this paper we present an original method for style transfer between music tracks. We have used a recurrent model consisting of LSTM layers enclosed within an encoder-decoder architecture. In addition, a method for programmatic synthesis of sufficient, paired training datasets using MIDI data was presented. The representation of the data in the form of a real and an imaginary part of short-time Fourier transformation allowed for independent modeling of the music components. The proposed architecture allowed us to improve upon the state of the art solutions in terms of efficiency and range of applications while achieving high precision of the network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goodfellow, I., et al. Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates Inc (2014)
Vaswani, A., et al.: Attention is all you need (2017)
Liang, F.T., Gotham, M., Johnson, M., Shotton, J.: Automatic stylistic composition of Bach Chorales with deep LSTM. In: ISMIR (2017)
Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music (2019)
Johnson, D.D., Keller, R.M., Weintraut, N.: Learning to create jazz melodies using a product of experts. In: ICCC (2017)
Wu, J., Hu, C., Wang, Y., Hu, X., Zhu, J.: A hierarchical recurrent neural network for symbolic melody generation (2018)
Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396. IEEE (2017)
Feng, L., Liu, S., Yao, J.: Music genre classification with paralleling recurrent convolutional neural network, December 2017
Ghosal, D., Kolekar, M.: Music genre recognition using deep neural networks and transfer learning, pp. 2087–2091, September 2018
Costa, Y., de Oliveira, L.S., Silla, C.: An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 52, 28–38 (2017)
Guaus, E.: Audio content processing for automatic music genre classification: descriptors, databases, and classifiers. Ph.D. thesis, University Pompeu Fabra, Barcelona, Spain (2009)
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. CoRR, abs/1508.06576 (2015)
Dai, S., Zhang, Z., Xia, G.G.: Music style transfer: a position paper. arXiv preprint arXiv:1803.06841 (2018)
Bitton, A., Esling, P., Chemla-Romeu-Santos, A.: Modulated variational auto-encoders for many-to-many musical timbre transfer. CoRR, abs/1810.00222 (2018)
van den Oord, A., et al.: WaveNet: a generative model for raw audio. CoRR, abs/1609.03499 (2016)
Peeters, G., Giordano, B.L., Susini, P., Misdariis, N., McAdams, S.: The timbre toolbox: extracting audio descriptors from musical signals. J. Acoust. Soc. Am. 130(5), 2902–2916 (2011)
Balazs, P., Dörfler, M., Jaillet, F., Holighaus, N., Velasco, G.: Theory, implementation and applications of nonstationary Gabor frames. J. Comput. Appl. Math. 236(6), 1481–1496 (2011)
Mor, N., Wolf, L., Polyak, A., Taigman, Y.: A universal music translation network. CoRR, abs/1805.07848 (2018)
Engel, J., et al.: Neural audio synthesis of musical notes with WaveNet autoencoders. In: ICML (2017)
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GAN (2018)
Lu, C.-Y., Xue, M.-X., Chang, C.-C., Lee, C.-R., Su, L.: Play as you like: timbre-enhanced multi-modal music style transfer (2018)
Brunner, G., Konrad, A., Wang, Y., Wattenhofer, R.: MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer (2018)
Verma, P., Smith, J.O.: Neural style transfer for audio spectograms (2018)
Zwicker, E., Fastl, H.: Psychoacoustics: Facts and Models, vol. 22. Springer, Heidelberg (2013)
Parncutt, R.: Harmony: A Psychoacoustical Approach, vol. 19. Springer, Heidelberg (2012)
Ganchev, T., Fakotakis, N., Kokkinakis, G.: Comparative evaluation of various MFCC implementations on the speaker verification task. In: Proceedings of the SPECOM 2005, pp. 191–194 (2005)
Sturm, B.L.: The GTZAN dataset: its contents, its faults, their effects on evaluation, and its future use. arXiv preprint arXiv:1306.1461 (2013)
Sturm, B.L.: An analysis of the GTZAN music genre dataset. In: Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, vol. 2012, pp. 7–12. Association for Computing Machinery. ACM Multimedia (2012)
Sturm, B.L.: A survey of evaluation in music genre recognition. In: Nürnberger, A., Stober, S., Larsen, B., Detyniecki, M. (eds.) AMR 2012. LNCS, vol. 8382, pp. 29–66. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12093-5_2
Ben-Tal, O., Harris, M.T., Sturm, B.L.: How music AI is useful: engagements with composers, performers, and audiences. In: Leonardo, pp. 1–13 (2020)
Sturm, B.L., et al.: Machine learning research that matters for music creation: a case study. J. New Music Res. 48(1), 36–55 (2019)
Raffel, C., Ellis, D.P.W.: Large-scale content-based matching of midi and audio files. In: ISMIR, pp. 234–240 (2015)
Hawthorne, C., et al.: Enabling factorized piano music modeling and generation with the MAESTRO dataset. In: International Conference on Learning Representations (2019)
Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University (2016)
Użytkownik: midi\(\_\)man. The largest midi collection on the internet, collected and sorted diligently by yours truly (2019). https://www.reddit.com/r/WeAreTheMusicMakers/comments/3ajwe4/the_largest_midi_collection_on_the_internet/
Ezen-Can, A.: A comparison of LSTM and BERT for small corpus. arXiv preprint arXiv:2009.05451 (2020)
Ding, W., et al.: Audio and face video emotion recognition in the wild using deep neural networks and small datasets. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 506–513 (2016)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR, abs/1609.08144 (2016)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Modrzejewski, M., Bereda, K., Rokita, P. (2021). Efficient Recurrent Neural Network Architecture for Musical Style Transfer. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2021. Lecture Notes in Computer Science(), vol 12854. Springer, Cham. https://doi.org/10.1007/978-3-030-87986-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-87986-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87985-3
Online ISBN: 978-3-030-87986-0
eBook Packages: Computer ScienceComputer Science (R0)