Skip to main content

Efficient Recurrent Neural Network Architecture for Musical Style Transfer

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12854))

Included in the following conference series:

  • 1145 Accesses

Abstract

In this paper we present an original method for style transfer between music tracks. We have used a recurrent model consisting of LSTM layers enclosed within an encoder-decoder architecture. In addition, a method for programmatic synthesis of sufficient, paired training datasets using MIDI data was presented. The representation of the data in the form of a real and an imaginary part of short-time Fourier transformation allowed for independent modeling of the music components. The proposed architecture allowed us to improve upon the state of the art solutions in terms of efficiency and range of applications while achieving high precision of the network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Goodfellow, I., et al. Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates Inc (2014)

    Google Scholar 

  2. Vaswani, A., et al.: Attention is all you need (2017)

    Google Scholar 

  3. Liang, F.T., Gotham, M., Johnson, M., Shotton, J.: Automatic stylistic composition of Bach Chorales with deep LSTM. In: ISMIR (2017)

    Google Scholar 

  4. Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music (2019)

    Google Scholar 

  5. Johnson, D.D., Keller, R.M., Weintraut, N.: Learning to create jazz melodies using a product of experts. In: ICCC (2017)

    Google Scholar 

  6. Wu, J., Hu, C., Wang, Y., Hu, X., Zhu, J.: A hierarchical recurrent neural network for symbolic melody generation (2018)

    Google Scholar 

  7. Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396. IEEE (2017)

    Google Scholar 

  8. Feng, L., Liu, S., Yao, J.: Music genre classification with paralleling recurrent convolutional neural network, December 2017

    Google Scholar 

  9. Ghosal, D., Kolekar, M.: Music genre recognition using deep neural networks and transfer learning, pp. 2087–2091, September 2018

    Google Scholar 

  10. Costa, Y., de Oliveira, L.S., Silla, C.: An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 52, 28–38 (2017)

    Article  Google Scholar 

  11. Guaus, E.: Audio content processing for automatic music genre classification: descriptors, databases, and classifiers. Ph.D. thesis, University Pompeu Fabra, Barcelona, Spain (2009)

    Google Scholar 

  12. Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. CoRR, abs/1508.06576 (2015)

    Google Scholar 

  13. Dai, S., Zhang, Z., Xia, G.G.: Music style transfer: a position paper. arXiv preprint arXiv:1803.06841 (2018)

  14. Bitton, A., Esling, P., Chemla-Romeu-Santos, A.: Modulated variational auto-encoders for many-to-many musical timbre transfer. CoRR, abs/1810.00222 (2018)

    Google Scholar 

  15. van den Oord, A., et al.: WaveNet: a generative model for raw audio. CoRR, abs/1609.03499 (2016)

    Google Scholar 

  16. Peeters, G., Giordano, B.L., Susini, P., Misdariis, N., McAdams, S.: The timbre toolbox: extracting audio descriptors from musical signals. J. Acoust. Soc. Am. 130(5), 2902–2916 (2011)

    Article  Google Scholar 

  17. Balazs, P., Dörfler, M., Jaillet, F., Holighaus, N., Velasco, G.: Theory, implementation and applications of nonstationary Gabor frames. J. Comput. Appl. Math. 236(6), 1481–1496 (2011)

    Article  MathSciNet  Google Scholar 

  18. Mor, N., Wolf, L., Polyak, A., Taigman, Y.: A universal music translation network. CoRR, abs/1805.07848 (2018)

    Google Scholar 

  19. Engel, J., et al.: Neural audio synthesis of musical notes with WaveNet autoencoders. In: ICML (2017)

    Google Scholar 

  20. Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GAN (2018)

    Google Scholar 

  21. Lu, C.-Y., Xue, M.-X., Chang, C.-C., Lee, C.-R., Su, L.: Play as you like: timbre-enhanced multi-modal music style transfer (2018)

    Google Scholar 

  22. Brunner, G., Konrad, A., Wang, Y., Wattenhofer, R.: MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer (2018)

    Google Scholar 

  23. Verma, P., Smith, J.O.: Neural style transfer for audio spectograms (2018)

    Google Scholar 

  24. Zwicker, E., Fastl, H.: Psychoacoustics: Facts and Models, vol. 22. Springer, Heidelberg (2013)

    Google Scholar 

  25. Parncutt, R.: Harmony: A Psychoacoustical Approach, vol. 19. Springer, Heidelberg (2012)

    Google Scholar 

  26. Ganchev, T., Fakotakis, N., Kokkinakis, G.: Comparative evaluation of various MFCC implementations on the speaker verification task. In: Proceedings of the SPECOM 2005, pp. 191–194 (2005)

    Google Scholar 

  27. Sturm, B.L.: The GTZAN dataset: its contents, its faults, their effects on evaluation, and its future use. arXiv preprint arXiv:1306.1461 (2013)

  28. Sturm, B.L.: An analysis of the GTZAN music genre dataset. In: Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, vol. 2012, pp. 7–12. Association for Computing Machinery. ACM Multimedia (2012)

    Google Scholar 

  29. Sturm, B.L.: A survey of evaluation in music genre recognition. In: Nürnberger, A., Stober, S., Larsen, B., Detyniecki, M. (eds.) AMR 2012. LNCS, vol. 8382, pp. 29–66. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12093-5_2

    Chapter  Google Scholar 

  30. Ben-Tal, O., Harris, M.T., Sturm, B.L.: How music AI is useful: engagements with composers, performers, and audiences. In: Leonardo, pp. 1–13 (2020)

    Google Scholar 

  31. Sturm, B.L., et al.: Machine learning research that matters for music creation: a case study. J. New Music Res. 48(1), 36–55 (2019)

    Article  Google Scholar 

  32. Raffel, C., Ellis, D.P.W.: Large-scale content-based matching of midi and audio files. In: ISMIR, pp. 234–240 (2015)

    Google Scholar 

  33. Hawthorne, C., et al.: Enabling factorized piano music modeling and generation with the MAESTRO dataset. In: International Conference on Learning Representations (2019)

    Google Scholar 

  34. Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University (2016)

    Google Scholar 

  35. Użytkownik: midi\(\_\)man. The largest midi collection on the internet, collected and sorted diligently by yours truly (2019). https://www.reddit.com/r/WeAreTheMusicMakers/comments/3ajwe4/the_largest_midi_collection_on_the_internet/

  36. Ezen-Can, A.: A comparison of LSTM and BERT for small corpus. arXiv preprint arXiv:2009.05451 (2020)

  37. Ding, W., et al.: Audio and face video emotion recognition in the wild using deep neural networks and small datasets. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 506–513 (2016)

    Google Scholar 

  38. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR, abs/1609.08144 (2016)

    Google Scholar 

  39. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mateusz Modrzejewski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Modrzejewski, M., Bereda, K., Rokita, P. (2021). Efficient Recurrent Neural Network Architecture for Musical Style Transfer. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2021. Lecture Notes in Computer Science(), vol 12854. Springer, Cham. https://doi.org/10.1007/978-3-030-87986-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-87986-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-87985-3

  • Online ISBN: 978-3-030-87986-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics