Skip to main content

A Divide and Conquer Approach to Automatic Music Transcription Using Neural Networks

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11805))

Included in the following conference series:

  • 1654 Accesses

Abstract

This paper describes a new approach for the automatic music transcription problem. We take advantage of the divide and conquer design paradigm and create several artificial neural networks, each one responsible for transcribing one musical note. This way, we depart from the traditional approach which resorts to a single classifier for transcribing all musical notes. To further improve results, an additional post-processing stage using artificial neural networks with the same design paradigm is also proposed. This last stage comprises three main steps: (1) fix notes duration, (2) fix notes duration regarding onsets and (3) fix onsets. The obtained results show that these steps were essential to improve the final transcription. We also compare our results with existing neural network-based approaches. Our approach is able to surpass current state-of-the-art works in frame-based results and, at the same time, reach similar results in onset only, thus demonstrating its viability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  2. Sigtia, S., Benetos, E., Dixon, S.: An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans. Audio Speech Lang. Process. 24(5), 927–939 (2016)

    Article  Google Scholar 

  3. Kelz, D., Korzeniowski, B., Arzt, W.: On the potential of simple framewise approaches to piano transcription. In: 17th International Society for Music Information Retrieval Conference (2016)

    Google Scholar 

  4. Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription. arXiv preprint arXiv:1710.11153 (2017)

  5. Marolt, M.: A connectionist approach to automatic transcription of polyphonic piano music. IEEE Trans. Multimed. 6, 439–449 (2004)

    Article  Google Scholar 

  6. Inácio, T., Miragaia, R., Reis, G., Grilo, C., Fernandéz, F.: Cartesian genetic programming applied to pitch estimation of piano notes. In: 2016 IEEE Symposium Series on Computational Intelligence, pp. 1–7 (2016)

    Google Scholar 

  7. Moorer, J.A.: On the segmentation and analysis of continuous musical sound by digital computer (1975)

    Google Scholar 

  8. Lea, A.P.: Auditory modeling of vowel perception. Ph.D. thesis, University of Nottingham, United Kingdom (1992)

    Google Scholar 

  9. Bello, J.P., Sandler, M.: Blackboard system and top-down processing for the transcription of simple polyphonic music. In: Proceedings of the COST G-6 Conference on Digital Audio Effects, pp. 7–9 (2000)

    Google Scholar 

  10. Klapuri, A.P.: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Trans. Speech Audio Process. 11(6), 804–816 (2003)

    Article  Google Scholar 

  11. Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180 (2003)

    Google Scholar 

  12. Emiya, V., Badeau, R., David, B.: Multipitch estimation of quasi-harmonic sounds in colored noise. In: 10th International Conference on Digital Audio Effects (2007)

    Google Scholar 

  13. Yeh, C.: Multiple fundamental frequency estimation of polyphonic recordings. Ph.D. thesis, University Paris, France (2008)

    Google Scholar 

  14. Reis, G.M.J.D.: Una aproximación genética a la transcripción automática de música. Ph.D. thesis, University of Extremadura, Spain (2012)

    Google Scholar 

  15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  16. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  17. Neelakantan, A., et al.: Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807 (2015)

  18. Montavon, G., Orr, Geneviève B., Müller, K.-R. (eds.): Neural Networks: Tricks of the Trade. LNCS, vol. 7700. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8

    Book  Google Scholar 

  19. Martins, L.G.P.M.: A computational framework for sound segregation in music signals. Ph.D. thesis, University of Porto, Portugal (2008)

    Google Scholar 

  20. Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)

    Google Scholar 

  21. Emiya, V., Badeau, R., David, B.: Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2010)

    Article  Google Scholar 

  22. Bay, M., Ehmann, A.F., Downie, J.S.: Evaluation of multiple-F0 estimation and tracking systems. In: The International Society of Music Information Retrieval, pp. 315–320 (2009)

    Google Scholar 

  23. Eyben, F., Böck, S., Schuller, B., Graves, A.: Universal onset detection with bidirectional long-short term memory neural networks. In: Proceedings 11th International Society for Music Information Retrieval Conference, pp. 589–594 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Grilo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gil, A., Grilo, C., Reis, G., Domingues, P. (2019). A Divide and Conquer Approach to Automatic Music Transcription Using Neural Networks. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11805. Springer, Cham. https://doi.org/10.1007/978-3-030-30244-3_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30244-3_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30243-6

  • Online ISBN: 978-3-030-30244-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics