Abstract
Optical Music Recognition (OMR) is an interdisciplinary field that aims to automate the process of transcribing sheet music into a digital format. Over the past few years, significant progress has been made in developing OMR systems that can recognize musical symbols with high accuracy. However, completing the pipeline of OMR remains a challenging endeavor due to the complexity and variability of music notation, and there are several open challenges that need to be addressed. In this position paper, we provide an overview of the current state-of-the-art in OMR through the two main lines of research. We include the problems that have been recently addressed and the techniques that have been considered. We then identify the key challenges that remain, such as learning to reconstruct the music notation, recognizing multiple voices, or dealing with artifacts such as lyrics. Finally, we suggest some possible directions for future research. We argue that addressing these challenges is crucial to making OMR a more practical and useful tool for musicians, scholars, and librarians alike.
Work produced with the support of a 2021 Leonardo Grant for Researchers and Cultural Creators, BBVA Foundation. The Foundation takes no responsibility for the opinions, statements and contents of this project, which are entirely the responsibility of its authors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bardes, A., Ponce, J., LeCun, Y.: Vicreg: variance-invariance-covariance regularization for self-supervised learning. In: The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022 (2022)
Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A.: From optical music recognition to handwritten music recognition: a baseline. Pattern Recognit. Lett. 123, 1–8 (2019)
Baró, A., Riba, P., Fornés, A.: Musigraph: optical music recognition through object detection and graph neural network. In: Porwal, U., Fornés, A., Shafait, F. (eds.) Frontiers in Handwriting Recognition - 18th International Conference, ICFHR 2022, Proceedings. Lecture Notes in Computer Science, Hyderabad, India, 4–7 December 2022, vol. 13639, pp. 171–184. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-21648-0_12
Baró, A., Badal, C., Fornés, A.: Handwritten historical music recognition by sequence-to-sequence with attention mechanism. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 205–210 (2020)
Calvo-Zaragoza, J., Jr, J.H., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. (CSUR) 53(4), 1–35 (2020)
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn. Lett. 128, 115–121 (2019)
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Hybrid hidden Markov models and artificial neural networks for handwritten music recognition in mensural notation. Pattern Anal. Appl. 22(4), 1573–1584 (2019)
Castellanos, F.J., Calvo-Zaragoza, J., Inesta, J.M.: A neural approach for full-page optical music recognition of mensural documents. In: Proceedings of the 21st International Society for Music Information Retrieval Conference, pp. 558–565. ISMIR, Montreal (2020)
Castellanos, F.J., Calvo-Zaragoza, J., Vigliensoni, G., Fujinaga, I.: Document analysis of music score images with selectional auto-encoders. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, pp. 256–263 (2018)
Castellanos, F.J., Garrido-Munoz, C., Ríos-Vila, A., Calvo-Zaragoza, J.: Region-based layout analysis of music score images. Expert Syst. Appl. 209, 118211 (2022)
Coquenet, D., Chatelain, C., Paquet, T.: Dan: a segmentation-free document attention network for handwritten document recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45, 8227–8243 (2023)
Fujinaga, I., Vigliensoni, G.: The art of teaching computers: the SIMSSA optical music recognition workflow system. In: 27th European Signal Processing Conference, EUSIPCO 2019, A Coruña, Spain, 2–6 September 2019, pp. 1–5. IEEE (2019)
Garrido-Munoz, C., Ríos-Vila, A., Calvo-Zaragoza, J.: A holistic approach for image-to-graph: application to optical music recognition. Int. J. Doc. Anal. Recognit. 25(4), 293–303 (2022)
Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the Twenty-Third International Conference on Machine Learning, (ICML 2006), Pittsburgh, Pennsylvania, USA, 25–29 June 2006, pp. 369–376 (2006)
Hajic, J., Pecina, P.: The MUSCIMA++ dataset for handwritten optical music recognition. In: 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, 9–15 November 2017, pp. 39–46. IEEE (2017)
Huang, Z., Jia, X., Guo, Y.: State-of-the-art model for music object recognition with deep learning. Appl. Sci. 9(13), 2645 (2019)
Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn. 129, 108766 (2022)
Konwer, A., et al.: Staff line removal using generative adversarial networks. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 1103–1108. IEEE (2018)
Li, M., et al.: Trocr: transformer-based optical character recognition with pre-trained models (2021). arXiv preprint arXiv:2109.10282
Pacha, A., Calvo-Zaragoza, J., Hajic Jr., J.: Learning notation graph construction for full-pipeline optical music recognition. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, 4–8 November 2019, pp. 75–82 (2019)
Pacha, A., Choi, K.Y., Coüasnon, B., Ricquebourg, Y., Zanibbi, R., Eidenberger, H.: Handwritten music object detection: open issues and baseline results. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 163–168. IEEE (2018)
Pacha, A., Hajič, J., Jr., Calvo-Zaragoza, J.: A baseline for general music object detection with deep learning. Appl. Sci. 8(9), 1488 (2018)
Paul, A., Pramanik, R., Malakar, S., Sarkar, R.: An ensemble of deep transfer learning models for handwritten music symbol recognition. Neural Comput. Appl. 34(13), 10409–10427 (2022)
Ríos-Vila, A., Iñesta, J.M., Calvo-Zaragoza, J.: On the use of transformers for end-to-end optical music recognition. In: Pattern Recognition and Image Analysis: 10th Iberian Conference, IbPRIA 2022, Aveiro, Portugal, 4–6 May 2022, Proceedings, pp. 470–481. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-04881-4_37
Ríos-Vila, A., Esplà-Gomis, M., Rizo, D., Ponce de León, P.J., Iñesta, J.M.: Applying automatic translation for optical music recognition’s encoding step. Appl. Sci. 11(9), 3890 (2021)
Ríos-Vila, A., Inesta, J.M., Calvo-Zaragoza, J.: End-to-end full-page optical music recognition for mensural notation. In: Proceedings of the 23rd International Society for Music Information Retrieval Conference, pp. 226–232. ISMIR, Bengaluru (2022)
Torras, P., Baró, A., Kang, L., Fornés, A.: On the integration of language models into sequence to sequence architectures for handwritten music recognition. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, pp. 690–696 (2021)
Tuggener, L., Elezi, I., Schmidhuber, J., Pelillo, M., Stadelmann, T.: Deepscores-a dataset for segmentation, detection and classification of tiny objects. In: 24th International Conference on Pattern Recognition, ICPR 2018, Beijing, China, 20–24 August 2018, pp. 3704–3709. IEEE Computer Society (2018)
Tuggener, L., Satyawan, Y.P., Pacha, A., Schmidhuber, J., Stadelmann, T.: The deepscoresv2 dataset and benchmark for music object detection. In: 25th International Conference on Pattern Recognition, ICPR 2020, Virtual Event/Milan, Italy, 10–15 January 2021, pp. 9188–9195. IEEE (2020)
van der Wel, E., Ullrich, K.: Optical music recognition with convolutional sequence-to-sequence models. In: Cunningham, S.J., Duan, Z., Hu, X., Turnbull, D. (eds.) Proceedings of the 18th International Society for Music Information Retrieval Conference, pp. 731–737 (2017)
Wen, C., Zhu, L.: A sequence-to-sequence framework based on transformer with masked language model for optical music recognition. IEEE Access 10, 118243–118252 (2022)
Wick, C., Puppe, F.: Experiments and detailed error-analysis of automatic square notation transcription of medieval music manuscripts using cnn/lstm-networks and a neume dictionary. J. New Music Res. 50(1), 18–36 (2021)
Wick, C., Hartelt, A., Puppe, F.: Staff, symbol and melody detection of medieval manuscripts written in square notation using deep fully convolutional networks. Appl. Sci. 9(13), 2646 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Calvo-Zaragoza, J., Martinez-Sevilla, J.C., Penarrubia, C., Rios-Vila, A. (2023). Optical Music Recognition: Recent Advances, Current Challenges, and Future Directions. In: Coustaty, M., Fornés, A. (eds) Document Analysis and Recognition – ICDAR 2023 Workshops. ICDAR 2023. Lecture Notes in Computer Science, vol 14193. Springer, Cham. https://doi.org/10.1007/978-3-031-41498-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-41498-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41497-8
Online ISBN: 978-3-031-41498-5
eBook Packages: Computer ScienceComputer Science (R0)