Abstract
Optical Music Recognition workflows currently involve several steps to retrieve information from music documents, focusing on image analysis and symbol recognition. However, despite many efforts, there is little research on how to bring these recognition results to practice, as there is still one step that has not been properly discussed: the encoding into standard music formats and its integration within OMR workflows to produce practical results that end-users could benefit from. In this paper, we approach this topic and propose options for completing OMR, eventually exporting the score image into a standard digital format. Specifically, we discuss the possibility of attaching Machine Translation systems to the recognition pipeline to perform the encoding step. After discussing the most appropriate systems for the process and proposing two options for the translation, we evaluate its performance in contrast to a direct-encoding pipeline. Our results confirm that the proposed addition to the pipeline establishes itself as a feasible and interesting solution for complete OMR processes, especially when limited training data is available, which represents a common scenario in music heritage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Short sequence of notes, typically the first ones, used for identifying a melody or musical work.
- 2.
References
Répertoire International des Sources Musicales (RISM) Series A/II: Music manuscripts after 1600 on CD-ROM. Technical report (2005)
Baró, A., Badal, C., Fornés, A.: Handwritten historical music recognition by sequence-to-sequence with attention mechanism. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 205–210 (2020)
Burgoyne, J.A., Devaney, J., Pugin, L., Fujinaga, I.: Enhanced bleedthrough correction for early music documents with recto-verso registration. In: Bello, J.P., Chew, E., Turnbull, D. (eds.) ISMIR 2008, 9th International Conference on Music Information Retrieval, Drexel University, Philadelphia, PA, USA, 14–18 September 2008, pp. 407–412 (2008)
Burnard, L., Bauman, S. (eds.): A gentle introduction to XML. Text encoding initiative consortium. In: TEI P5: Guidelines for Electronic Text Encoding and Interchange (2007). http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SG.html
Byrd, D., Simonsen, J.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44, 169–195 (2015). https://doi.org/10.1080/09298215.2015.1045424
Calvo-Zaragoza, J., Hajič Jr., J., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4) (2020)
Calvo-Zaragoza, J., Rizo, D.: Camera-PrIMuS: neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, 23–27 September 2018, pp. 248–255 (2018)
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn. Lett. 128, 115–121 (2019)
Clares Clares, E.: Canción de trilla. Fondo de música tradicional IMF-CSIC. https://musicatradicional.eu/es/piece/12551. Accessed 01 Feb 2021
Dalitz, C., Droettboom, M., Pranzas, B., Fujinaga, I.: A comparative study of staff removal algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 753–766 (2008)
Good, M., Actor, G.: Using MusicXML for file interchange. In: International Conference on Web Delivering of Music 0, 153 (2003)
Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the Twenty-Third International Conference on Machine Learning, (ICML 2006), Pittsburgh, Pennsylvania, USA, 25–29 June 2006, pp. 369–376 (2006)
Hajic, J., Pecina, P.: The MUSCIMA++ dataset for handwritten optical music recognition. In: ICDAR (2017)
Hankinson, A., Roland, P., Fujinaga, I.: The music encoding initiative as a document-encoding framework. In: Proceedings of the 12th International Society for Music Information Retrieval Conference (2011)
Huron, D.: Humdrum and Kern: Selective Feature Encoding, pp. 375–401. MIT Press, Cambridge (1997)
Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 177–180. Association for Computational Linguistics, Prague (2007)
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015, pp. 1412–1421. The Association for Computational Linguistics (2015)
Pacha, A., Calvo-Zaragoza, J., Hajič jr., J.: Learning notation graph construction for full-pipeline optical music recognition. In: 20th International Society for Music Information Retrieval Conference, pp. 75–82 (2019)
Pacha, A., Eidenberger, H.: Towards a universal music symbol classifier. In: 14th International Conference on Document Analysis and Recognition, pp. 35–36. IAPR TC10 (Technical Committee on Graphics Recognition), IEEE Computer Society, Kyoto (2017)
Pacha, A., Hajič, J., Calvo-Zaragoza, J.: A baseline for general music object detection with deep learning. Appl. Sci. 8(9), 1488 (2018)
Parada-Cabaleiro, E., Batliner, A., Schuller, B.W.: A diplomatic edition of il lauro secco: ground truth for OMR of white mensural notation. In: Flexer, A., Peeters, G., Urbano, J., Volk, A. (eds.) Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, 4–8 November 2019, pp. 557–564 (2019)
Pugin, L., Zitellini, R., Roland, P.: Verovio: a library for engraving MEI music notation into SVG. In: Proceedings of the 15th International Society for Music Information Retrieval Conference, pp. 107–112. ISMIR, October 2014
Quirós, L., Toselli, A.H., Vidal, E.: Multi-task layout analysis of handwritten musical scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 123–134. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_11
Ríos-Vila, A., Calvo-Zaragoza, J., Rizo, D.: Evaluating simultaneous recognition and encoding for optical music recognition. In: 7th International Conference on Digital Libraries for Musicology, DLfM 2020, pp. 10–17. Association for Computing Machinery, New York (2020)
Ríos-Vila, A., Esplà-Gomis, M., Rizo, D., Ponce de León, P.J., Iñesta, J.M.: Applying automatic translation for optical music recognition’s encoding step. Appl. Sci. 11(9) (2021)
Sánchez, J., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: A set of benchmarks for handwritten text recognition on historical documents. Pattern Recognit. 94, 122–134 (2019)
Sapp, C.S.: Verovio humdrum viewer. In: Proceedings of Music Encoding Conference (MEC), Tours, France (2017)
Tuggener, L., Elezi, I., Schmidhuber, J., Stadelmann, T.: Deep watershed detector for music object recognition. In: 19th International Society for Music Information Retrieval Conference, Paris, 23–27 September 2018 (2018)
Vaswani, A., et al.: Attention is all you need (2017)
Acknowledgments
This work was supported by the Generalitat Valenciana through project GV/2020/030.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Ríos-Vila, A., Rizo, D., Calvo-Zaragoza, J. (2021). Complete Optical Music Recognition via Agnostic Transcription and Machine Translation. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_43
Download citation
DOI: https://doi.org/10.1007/978-3-030-86334-0_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86333-3
Online ISBN: 978-3-030-86334-0
eBook Packages: Computer ScienceComputer Science (R0)