Abstract
This paper describes a new approach for the automatic music transcription problem. We take advantage of the divide and conquer design paradigm and create several artificial neural networks, each one responsible for transcribing one musical note. This way, we depart from the traditional approach which resorts to a single classifier for transcribing all musical notes. To further improve results, an additional post-processing stage using artificial neural networks with the same design paradigm is also proposed. This last stage comprises three main steps: (1) fix notes duration, (2) fix notes duration regarding onsets and (3) fix onsets. The obtained results show that these steps were essential to improve the final transcription. We also compare our results with existing neural network-based approaches. Our approach is able to surpass current state-of-the-art works in frame-based results and, at the same time, reach similar results in onset only, thus demonstrating its viability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Sigtia, S., Benetos, E., Dixon, S.: An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans. Audio Speech Lang. Process. 24(5), 927–939 (2016)
Kelz, D., Korzeniowski, B., Arzt, W.: On the potential of simple framewise approaches to piano transcription. In: 17th International Society for Music Information Retrieval Conference (2016)
Hawthorne, C., et al.: Onsets and frames: dual-objective piano transcription. arXiv preprint arXiv:1710.11153 (2017)
Marolt, M.: A connectionist approach to automatic transcription of polyphonic piano music. IEEE Trans. Multimed. 6, 439–449 (2004)
Inácio, T., Miragaia, R., Reis, G., Grilo, C., Fernandéz, F.: Cartesian genetic programming applied to pitch estimation of piano notes. In: 2016 IEEE Symposium Series on Computational Intelligence, pp. 1–7 (2016)
Moorer, J.A.: On the segmentation and analysis of continuous musical sound by digital computer (1975)
Lea, A.P.: Auditory modeling of vowel perception. Ph.D. thesis, University of Nottingham, United Kingdom (1992)
Bello, J.P., Sandler, M.: Blackboard system and top-down processing for the transcription of simple polyphonic music. In: Proceedings of the COST G-6 Conference on Digital Audio Effects, pp. 7–9 (2000)
Klapuri, A.P.: Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Trans. Speech Audio Process. 11(6), 804–816 (2003)
Smaragdis, P., Brown, J.C.: Non-negative matrix factorization for polyphonic music transcription. In: 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 177–180 (2003)
Emiya, V., Badeau, R., David, B.: Multipitch estimation of quasi-harmonic sounds in colored noise. In: 10th International Conference on Digital Audio Effects (2007)
Yeh, C.: Multiple fundamental frequency estimation of polyphonic recordings. Ph.D. thesis, University Paris, France (2008)
Reis, G.M.J.D.: Una aproximación genética a la transcripción automática de música. Ph.D. thesis, University of Extremadura, Spain (2012)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Neelakantan, A., et al.: Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807 (2015)
Montavon, G., Orr, Geneviève B., Müller, K.-R. (eds.): Neural Networks: Tricks of the Trade. LNCS, vol. 7700. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8
Martins, L.G.P.M.: A computational framework for sound segregation in music signals. Ph.D. thesis, University of Porto, Portugal (2008)
Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Emiya, V., Badeau, R., David, B.: Multipitch estimation of piano sounds using a new probabilistic spectral smoothness principle. IEEE Trans. Audio Speech Lang. Process. 18(6), 1643–1654 (2010)
Bay, M., Ehmann, A.F., Downie, J.S.: Evaluation of multiple-F0 estimation and tracking systems. In: The International Society of Music Information Retrieval, pp. 315–320 (2009)
Eyben, F., Böck, S., Schuller, B., Graves, A.: Universal onset detection with bidirectional long-short term memory neural networks. In: Proceedings 11th International Society for Music Information Retrieval Conference, pp. 589–594 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gil, A., Grilo, C., Reis, G., Domingues, P. (2019). A Divide and Conquer Approach to Automatic Music Transcription Using Neural Networks. In: Moura Oliveira, P., Novais, P., Reis, L. (eds) Progress in Artificial Intelligence. EPIA 2019. Lecture Notes in Computer Science(), vol 11805. Springer, Cham. https://doi.org/10.1007/978-3-030-30244-3_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-30244-3_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30243-6
Online ISBN: 978-3-030-30244-3
eBook Packages: Computer ScienceComputer Science (R0)