Abstract
Optical Music Recognition is the research field that investigates how to computationally read music notation from document images. State-of-the-art technologies, based on Convolutional Recurrent Neural Networks, typically follow an end-to-end approach that operates at the staff level; i.e., a single stage for completely processing the image of a single staff and retrieving the series of symbols that appear therein. This type of models demands a training set of sufficient size; however, the existence of many music manuscripts of reduced size questions the usefulness of this framework. In order to address such a drawback, we propose a sequential classification-based approach for music documents that processes sequentially the staff image. This is achieved by predicting, in the proper reading order, the symbol locations and their corresponding music-notation labels. Our experimental results report a noticeable improvement over previous attempts in scenarios of limited ground truth (for instance, decreasing the Symbol Error Rate from 70% to 37% with just 80 training staves), while still attaining a competitive performance as the training set size increases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Music notation system used for the most of the XVI and XVII centuries in Europe.
References
Alfaro-Contreras, M., Valero-Mas, J.J.: Exploiting the two-dimensional nature of agnostic music notation for neural optical music recognition. Appl. Sci. 11(8), 3621 (2021)
Bainbridge, D., Bell, T.: The challenge of optical music recognition. Comput. Humanit. 35(2), 95–121 (2001)
Baró, A., Badal, C., Fornês, A.: Handwritten historical music recognition by sequence-to-sequence with attention mechanism. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 205–210 (2020)
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3722–3731 (2017)
Calvo-Zaragoza, J., Jr, J.H., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. (CSUR) 53(4), 1–35 (2020)
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation: formulation, data and baseline results. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1081–1086. IEEE (2017)
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn. Lett. 128, 115–121 (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern recognition, pp. 248–255. IEEE (2009)
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2030–2096 (2016)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, New York, NY, USA, pp. 369–376. ACM (2006)
Nibali, A., He, Z., Morgan, S., Prendergast, L.: Numerical coordinate regression with convolutional neural networks. Computer research repository abs/1801.07372 (2018). http://arxiv.org/abs/1801.07372
Nuñez-Alcover, A., de León, P.J.P., Calvo-Zaragoza, J.: Glyph and position classification of music symbols in early music manuscripts. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 159–168. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_14
Pacha, A., Eidenberger, H.: Towards a universal music symbol classifier. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 2, pp. 35–36. IEEE (2017)
Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marçal, A., Guedes, C., Cardoso, J.: Optical music recognition: state-of-the-art and open issues. Int. J. Multimed. Inf. Retr. 1, 173–190 (2012)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Ríos-Vila, A., Calvo-Zaragoza, J., Iñesta, J.M.: Exploring the two-dimensional nature of music notation for score recognition with end-to-end approaches. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 193–198 (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Villarreal, M., Sánchez, J.A.: Handwritten music recognition improvement through language model re-interpretation for mensural notation. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 199–204 (2020)
Wick, C., Puppe, F.: Experiments and detailed error-analysis of automatic square notation transcription of medieval music manuscripts using CNN/LSTM-networks and a neume dictionary. J. New Music Res. 1–19 (2021)
Acknowledgments
This work was supported by the Generalitat Valenciana through project GV/2020/030. Second author acknowledges the support from the Spanish Ministerio de Universidades through grant FPU19/04957.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mas-Candela, E., Alfaro-Contreras, M., Calvo-Zaragoza, J. (2021). Sequential Next-Symbol Prediction for Optical Music Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-86334-0_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86333-3
Online ISBN: 978-3-030-86334-0
eBook Packages: Computer ScienceComputer Science (R0)