Efficient Recurrent Neural Network Architecture for Musical Style Transfer

Modrzejewski, Mateusz; Bereda, Konrad; Rokita, Przemysław

doi:10.1007/978-3-030-87986-0_11

Mateusz Modrzejewski¹⁴,
Konrad Bereda¹⁴ &
Przemysław Rokita¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12854))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1145 Accesses

Abstract

In this paper we present an original method for style transfer between music tracks. We have used a recurrent model consisting of LSTM layers enclosed within an encoder-decoder architecture. In addition, a method for programmatic synthesis of sufficient, paired training datasets using MIDI data was presented. The representation of the data in the form of a real and an imaginary part of short-time Fourier transformation allowed for independent modeling of the music components. The proposed architecture allowed us to improve upon the state of the art solutions in terms of efficiency and range of applications while achieving high precision of the network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

LSTM-RNN-Based Automatic Music Generation Algorithm

Towards End-to-End Raw Audio Music Synthesis

Rhythm, Chord and Melody Generation for Lead Sheets Using Recurrent Neural Networks

References

Goodfellow, I., et al. Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates Inc (2014)
Google Scholar
Vaswani, A., et al.: Attention is all you need (2017)
Google Scholar
Liang, F.T., Gotham, M., Johnson, M., Shotton, J.: Automatic stylistic composition of Bach Chorales with deep LSTM. In: ISMIR (2017)
Google Scholar
Roberts, A., Engel, J., Raffel, C., Hawthorne, C., Eck, D.: A hierarchical latent vector model for learning long-term structure in music (2019)
Google Scholar
Johnson, D.D., Keller, R.M., Weintraut, N.: Learning to create jazz melodies using a product of experts. In: ICCC (2017)
Google Scholar
Wu, J., Hu, C., Wang, Y., Hu, X., Zhu, J.: A hierarchical recurrent neural network for symbolic melody generation (2018)
Google Scholar
Choi, K., Fazekas, G., Sandler, M., Cho, K.: Convolutional recurrent neural networks for music classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2392–2396. IEEE (2017)
Google Scholar
Feng, L., Liu, S., Yao, J.: Music genre classification with paralleling recurrent convolutional neural network, December 2017
Google Scholar
Ghosal, D., Kolekar, M.: Music genre recognition using deep neural networks and transfer learning, pp. 2087–2091, September 2018
Google Scholar
Costa, Y., de Oliveira, L.S., Silla, C.: An evaluation of convolutional neural networks for music classification using spectrograms. Appl. Soft Comput. 52, 28–38 (2017)
Article Google Scholar
Guaus, E.: Audio content processing for automatic music genre classification: descriptors, databases, and classifiers. Ph.D. thesis, University Pompeu Fabra, Barcelona, Spain (2009)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. CoRR, abs/1508.06576 (2015)
Google Scholar
Dai, S., Zhang, Z., Xia, G.G.: Music style transfer: a position paper. arXiv preprint arXiv:1803.06841 (2018)
Bitton, A., Esling, P., Chemla-Romeu-Santos, A.: Modulated variational auto-encoders for many-to-many musical timbre transfer. CoRR, abs/1810.00222 (2018)
Google Scholar
van den Oord, A., et al.: WaveNet: a generative model for raw audio. CoRR, abs/1609.03499 (2016)
Google Scholar
Peeters, G., Giordano, B.L., Susini, P., Misdariis, N., McAdams, S.: The timbre toolbox: extracting audio descriptors from musical signals. J. Acoust. Soc. Am. 130(5), 2902–2916 (2011)
Article Google Scholar
Balazs, P., Dörfler, M., Jaillet, F., Holighaus, N., Velasco, G.: Theory, implementation and applications of nonstationary Gabor frames. J. Comput. Appl. Math. 236(6), 1481–1496 (2011)
Article MathSciNet Google Scholar
Mor, N., Wolf, L., Polyak, A., Taigman, Y.: A universal music translation network. CoRR, abs/1805.07848 (2018)
Google Scholar
Engel, J., et al.: Neural audio synthesis of musical notes with WaveNet autoencoders. In: ICML (2017)
Google Scholar
Jolicoeur-Martineau, A.: The relativistic discriminator: a key element missing from standard GAN (2018)
Google Scholar
Lu, C.-Y., Xue, M.-X., Chang, C.-C., Lee, C.-R., Su, L.: Play as you like: timbre-enhanced multi-modal music style transfer (2018)
Google Scholar
Brunner, G., Konrad, A., Wang, Y., Wattenhofer, R.: MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer (2018)
Google Scholar
Verma, P., Smith, J.O.: Neural style transfer for audio spectograms (2018)
Google Scholar
Zwicker, E., Fastl, H.: Psychoacoustics: Facts and Models, vol. 22. Springer, Heidelberg (2013)
Google Scholar
Parncutt, R.: Harmony: A Psychoacoustical Approach, vol. 19. Springer, Heidelberg (2012)
Google Scholar
Ganchev, T., Fakotakis, N., Kokkinakis, G.: Comparative evaluation of various MFCC implementations on the speaker verification task. In: Proceedings of the SPECOM 2005, pp. 191–194 (2005)
Google Scholar
Sturm, B.L.: The GTZAN dataset: its contents, its faults, their effects on evaluation, and its future use. arXiv preprint arXiv:1306.1461 (2013)
Sturm, B.L.: An analysis of the GTZAN music genre dataset. In: Proceedings of the Second International ACM Workshop on Music Information Retrieval with User-Centered and Multimodal Strategies, vol. 2012, pp. 7–12. Association for Computing Machinery. ACM Multimedia (2012)
Google Scholar
Sturm, B.L.: A survey of evaluation in music genre recognition. In: Nürnberger, A., Stober, S., Larsen, B., Detyniecki, M. (eds.) AMR 2012. LNCS, vol. 8382, pp. 29–66. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12093-5_2
Chapter Google Scholar
Ben-Tal, O., Harris, M.T., Sturm, B.L.: How music AI is useful: engagements with composers, performers, and audiences. In: Leonardo, pp. 1–13 (2020)
Google Scholar
Sturm, B.L., et al.: Machine learning research that matters for music creation: a case study. J. New Music Res. 48(1), 36–55 (2019)
Article Google Scholar
Raffel, C., Ellis, D.P.W.: Large-scale content-based matching of midi and audio files. In: ISMIR, pp. 234–240 (2015)
Google Scholar
Hawthorne, C., et al.: Enabling factorized piano music modeling and generation with the MAESTRO dataset. In: International Conference on Learning Representations (2019)
Google Scholar
Raffel, C.: Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University (2016)
Google Scholar
Użytkownik: midi$\_$man. The largest midi collection on the internet, collected and sorted diligently by yours truly (2019). https://www.reddit.com/r/WeAreTheMusicMakers/comments/3ajwe4/the_largest_midi_collection_on_the_internet/
Ezen-Can, A.: A comparison of LSTM and BERT for small corpus. arXiv preprint arXiv:2009.05451 (2020)
Ding, W., et al.: Audio and face video emotion recognition in the wild using deep neural networks and small datasets. In: Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 506–513 (2016)
Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR, abs/1609.08144 (2016)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

Download references

Author information

Authors and Affiliations

Division of Computer Graphics, Institute of Computer Science, The Faculty of Electronics and Information Technology, Warsaw University of Technology, Nowowiejska 15/19, 00-665, Warsaw, Poland
Mateusz Modrzejewski, Konrad Bereda & Przemysław Rokita

Authors

Mateusz Modrzejewski
View author publications
You can also search for this author in PubMed Google Scholar
Konrad Bereda
View author publications
You can also search for this author in PubMed Google Scholar
Przemysław Rokita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mateusz Modrzejewski .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Częstochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Częstochowa, Poland
Rafał Scherer
Częstochowa University of Technology, Częstochowa, Poland
Marcin Korytkowski
Edmonton, AB, Canada
Witold Pedrycz
AGH University of Science and Technology, Krakow, Poland
Ryszard Tadeusiewicz
Electrical and Computer Engineering, University of Louisville, Louisville, KY, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Modrzejewski, M., Bereda, K., Rokita, P. (2021). Efficient Recurrent Neural Network Architecture for Musical Style Transfer. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2021. Lecture Notes in Computer Science(), vol 12854. Springer, Cham. https://doi.org/10.1007/978-3-030-87986-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-87986-0_11
Published: 05 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87985-3
Online ISBN: 978-3-030-87986-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient Recurrent Neural Network Architecture for Musical Style Transfer

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LSTM-RNN-Based Automatic Music Generation Algorithm

Towards End-to-End Raw Audio Music Synthesis

Rhythm, Chord and Melody Generation for Lead Sheets Using Recurrent Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Efficient Recurrent Neural Network Architecture for Musical Style Transfer

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

LSTM-RNN-Based Automatic Music Generation Algorithm

Towards End-to-End Raw Audio Music Synthesis

Rhythm, Chord and Melody Generation for Lead Sheets Using Recurrent Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation