Complete Optical Music Recognition via Agnostic Transcription and Machine Translation

Ríos-Vila, Antonio; Rizo, David; Calvo-Zaragoza, Jorge

doi:10.1007/978-3-030-86334-0_43

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12823))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3460 Accesses
3 Citations

Abstract

Optical Music Recognition workflows currently involve several steps to retrieve information from music documents, focusing on image analysis and symbol recognition. However, despite many efforts, there is little research on how to bring these recognition results to practice, as there is still one step that has not been properly discussed: the encoding into standard music formats and its integration within OMR workflows to produce practical results that end-users could benefit from. In this paper, we approach this topic and propose options for completing OMR, eventually exporting the score image into a standard digital format. Specifically, we discuss the possibility of attaching Machine Translation systems to the recognition pipeline to perform the encoding step. After discussing the most appropriate systems for the process and proposing two options for the translation, we evaluate its performance in contrast to a direct-encoding pipeline. Our results confirm that the proposed addition to the pipeline establishes itself as a feasible and interesting solution for complete OMR processes, especially when limited training data is available, which represents a common scenario in music heritage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Short sequence of notes, typically the first ones, used for identifying a melody or musical work.
2.
https://musicatradicional.eu.

References

Répertoire International des Sources Musicales (RISM) Series A/II: Music manuscripts after 1600 on CD-ROM. Technical report (2005)
Google Scholar
Baró, A., Badal, C., Fornés, A.: Handwritten historical music recognition by sequence-to-sequence with attention mechanism. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 205–210 (2020)
Google Scholar
Burgoyne, J.A., Devaney, J., Pugin, L., Fujinaga, I.: Enhanced bleedthrough correction for early music documents with recto-verso registration. In: Bello, J.P., Chew, E., Turnbull, D. (eds.) ISMIR 2008, 9th International Conference on Music Information Retrieval, Drexel University, Philadelphia, PA, USA, 14–18 September 2008, pp. 407–412 (2008)
Google Scholar
Burnard, L., Bauman, S. (eds.): A gentle introduction to XML. Text encoding initiative consortium. In: TEI P5: Guidelines for Electronic Text Encoding and Interchange (2007). http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SG.html
Byrd, D., Simonsen, J.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44, 169–195 (2015). https://doi.org/10.1080/09298215.2015.1045424
Article Google Scholar
Calvo-Zaragoza, J., Hajič Jr., J., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4) (2020)
Google Scholar
Calvo-Zaragoza, J., Rizo, D.: Camera-PrIMuS: neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, 23–27 September 2018, pp. 248–255 (2018)
Google Scholar
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn. Lett. 128, 115–121 (2019)
Article Google Scholar
Clares Clares, E.: Canción de trilla. Fondo de música tradicional IMF-CSIC. https://musicatradicional.eu/es/piece/12551. Accessed 01 Feb 2021
Dalitz, C., Droettboom, M., Pranzas, B., Fujinaga, I.: A comparative study of staff removal algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 753–766 (2008)
Article Google Scholar
Good, M., Actor, G.: Using MusicXML for file interchange. In: International Conference on Web Delivering of Music 0, 153 (2003)
Google Scholar
Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the Twenty-Third International Conference on Machine Learning, (ICML 2006), Pittsburgh, Pennsylvania, USA, 25–29 June 2006, pp. 369–376 (2006)
Google Scholar
Hajic, J., Pecina, P.: The MUSCIMA++ dataset for handwritten optical music recognition. In: ICDAR (2017)
Google Scholar
Hankinson, A., Roland, P., Fujinaga, I.: The music encoding initiative as a document-encoding framework. In: Proceedings of the 12th International Society for Music Information Retrieval Conference (2011)
Google Scholar
Huron, D.: Humdrum and Kern: Selective Feature Encoding, pp. 375–401. MIT Press, Cambridge (1997)
Google Scholar
Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)
Google Scholar
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 177–180. Association for Computational Linguistics, Prague (2007)
Google Scholar
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015, pp. 1412–1421. The Association for Computational Linguistics (2015)
Google Scholar
Pacha, A., Calvo-Zaragoza, J., Hajič jr., J.: Learning notation graph construction for full-pipeline optical music recognition. In: 20th International Society for Music Information Retrieval Conference, pp. 75–82 (2019)
Google Scholar
Pacha, A., Eidenberger, H.: Towards a universal music symbol classifier. In: 14th International Conference on Document Analysis and Recognition, pp. 35–36. IAPR TC10 (Technical Committee on Graphics Recognition), IEEE Computer Society, Kyoto (2017)
Google Scholar
Pacha, A., Hajič, J., Calvo-Zaragoza, J.: A baseline for general music object detection with deep learning. Appl. Sci. 8(9), 1488 (2018)
Article Google Scholar
Parada-Cabaleiro, E., Batliner, A., Schuller, B.W.: A diplomatic edition of il lauro secco: ground truth for OMR of white mensural notation. In: Flexer, A., Peeters, G., Urbano, J., Volk, A. (eds.) Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, 4–8 November 2019, pp. 557–564 (2019)
Google Scholar
Pugin, L., Zitellini, R., Roland, P.: Verovio: a library for engraving MEI music notation into SVG. In: Proceedings of the 15th International Society for Music Information Retrieval Conference, pp. 107–112. ISMIR, October 2014
Google Scholar
Quirós, L., Toselli, A.H., Vidal, E.: Multi-task layout analysis of handwritten musical scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 123–134. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_11
Chapter Google Scholar
Ríos-Vila, A., Calvo-Zaragoza, J., Rizo, D.: Evaluating simultaneous recognition and encoding for optical music recognition. In: 7th International Conference on Digital Libraries for Musicology, DLfM 2020, pp. 10–17. Association for Computing Machinery, New York (2020)
Google Scholar
Ríos-Vila, A., Esplà-Gomis, M., Rizo, D., Ponce de León, P.J., Iñesta, J.M.: Applying automatic translation for optical music recognition’s encoding step. Appl. Sci. 11(9) (2021)
Google Scholar
Sánchez, J., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: A set of benchmarks for handwritten text recognition on historical documents. Pattern Recognit. 94, 122–134 (2019)
Article Google Scholar
Sapp, C.S.: Verovio humdrum viewer. In: Proceedings of Music Encoding Conference (MEC), Tours, France (2017)
Google Scholar
Tuggener, L., Elezi, I., Schmidhuber, J., Stadelmann, T.: Deep watershed detector for music object recognition. In: 19th International Society for Music Information Retrieval Conference, Paris, 23–27 September 2018 (2018)
Google Scholar
Vaswani, A., et al.: Attention is all you need (2017)
Google Scholar

Download references

Acknowledgments

This work was supported by the Generalitat Valenciana through project GV/2020/030.

Author information

Authors and Affiliations

U.I. for Computing Research, University of Alicante, Alicante, Spain
Antonio Ríos-Vila, David Rizo & Jorge Calvo-Zaragoza
Instituto Superior de Enseñanzas Artísticas de la Comunidad Valenciana (ISEA.CV), Valencia, Spain
David Rizo

Authors

Antonio Ríos-Vila
View author publications
You can also search for this author in PubMed Google Scholar
David Rizo
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Calvo-Zaragoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Ríos-Vila .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ríos-Vila, A., Rizo, D., Calvo-Zaragoza, J. (2021). Complete Optical Music Recognition via Agnostic Transcription and Machine Translation. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-86334-0_43
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86333-3
Online ISBN: 978-3-030-86334-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)