Skip to main content

Towards End-to-End Raw Audio Music Synthesis

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11141))

Abstract

In this paper, we address the problem of automated music synthesis using deep neural networks and ask whether neural networks are capable of realizing timing, pitch accuracy and pattern generalization for automated music generation when processing raw audio data. To this end, we present a proof of concept and build a recurrent neural network architecture capable of generalizing appropriate musical raw audio tracks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://redd.it/3ajwe4, accessed 18/01/18.

  2. 2.

    http://www.publications.eppe.eu/data/Giuliani_Op74_No15_Andantino_grazioso_merged

  3. 3.

    http://www.publications.eppe.eu/data/The_Beatles_Ob-La-Di_Ob-La-Da_merged.wav

  4. 4.

    http://www.publications.eppe.eu/data/Bob_Dylan_Positively_4th_Street_merged.wav

References

  1. Bergstra, J., Yamins, D., Cox, D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: International Conference on Machine Learning (ICML) (2013)

    Google Scholar 

  2. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: Neural Information Processing Systems (NIPS) (2014)

    Google Scholar 

  3. Engel, J., et al.: Neural audio synthesis of musical notes with WaveNet autoencoders. Technical report (2017). http://arxiv.org/abs/1704.01279

  4. Eppe, M., et al.: Computational invention of cadences and chord progressions by conceptual chord-blending. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), pp. 2445–2451 (2015)

    Google Scholar 

  5. Eppe, M., Kerzel, M., Strahl, E.: Deep neural object analysis by interactive auditory exploration with a humanoid robot. In: International Conference on Intelligent Robots and Systems (IROS) (2018)

    Google Scholar 

  6. Eppe, M., et al.: A computational framework for concept blending. Artif. Intell. 256(3), 105–129 (2018)

    Google Scholar 

  7. Griffin, D., Lim, J.: Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 32(2), 236–243 (1984)

    Article  Google Scholar 

  8. Huang, A., Wu, R.: Deep learning for music. Technical report (2016). https://arxiv.org/pdf/1606.04930.pdf

  9. Kalingeri, V., Grandhe, S.: Music generation using deep learning. Technical report (2016). https://arxiv.org/pdf/1612.04928.pdf

  10. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)

    Google Scholar 

  11. Lee, J., Cho, K., Hofmann, T.: Fully character-level neural machine translation without explicit segmentation. Trans. Assoc. Comput. Linguist. 5, 365–378 (2017)

    Google Scholar 

  12. Liang, F., Gotham, M., Johnson, M., Shotton, J.: Automatic stylistic composition of bach chorales with deep LSTM. In: Proceedings of the 18th International Society for Music Information Retrieval Conference, pp. 449–456 (2017)

    Google Scholar 

  13. Mcfee, B., et al.: librosa: audio and music signal analysis in Python. In: Python in Science Conference (SciPy) (2015)

    Google Scholar 

  14. Nayebi, A., Vitelli, M.: GRUV: algorithmic music generation using recurrent neural networks. Stanford University, Technical report (2015)

    Google Scholar 

  15. van den Oord, A., et al.: WaveNet: a generative model for raw audio. Technical report (2016). http://arxiv.org/abs/1609.03499

  16. Simon, I., Oore, S.: Performance RNN: generating music with expressive timing and dynamics (2017). https://magenta.tensorflow.org/performance-rnn

  17. Smith, J.O.: Spectral Audio Signal Processing. W3K Publishing (2011)

    Google Scholar 

  18. Wang, Y., et al.: Tacotron: towards end-to-end speech synthesis. Technical report, Google, Inc. (2017). http://arxiv.org/abs/1703.10135

Download references

Acknowledgments

The authors gratefully acknowledge partial support from the German Research Foundation DFG under project CML (TRR 169), the European Union under project SECURE (No 642667).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manfred Eppe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Eppe, M., Alpay, T., Wermter, S. (2018). Towards End-to-End Raw Audio Music Synthesis. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds) Artificial Neural Networks and Machine Learning – ICANN 2018. ICANN 2018. Lecture Notes in Computer Science(), vol 11141. Springer, Cham. https://doi.org/10.1007/978-3-030-01424-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01424-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01423-0

  • Online ISBN: 978-3-030-01424-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics