Skip to main content

Complete Optical Music Recognition via Agnostic Transcription and Machine Translation

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Abstract

Optical Music Recognition workflows currently involve several steps to retrieve information from music documents, focusing on image analysis and symbol recognition. However, despite many efforts, there is little research on how to bring these recognition results to practice, as there is still one step that has not been properly discussed: the encoding into standard music formats and its integration within OMR workflows to produce practical results that end-users could benefit from. In this paper, we approach this topic and propose options for completing OMR, eventually exporting the score image into a standard digital format. Specifically, we discuss the possibility of attaching Machine Translation systems to the recognition pipeline to perform the encoding step. After discussing the most appropriate systems for the process and proposing two options for the translation, we evaluate its performance in contrast to a direct-encoding pipeline. Our results confirm that the proposed addition to the pipeline establishes itself as a feasible and interesting solution for complete OMR processes, especially when limited training data is available, which represents a common scenario in music heritage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Short sequence of notes, typically the first ones, used for identifying a melody or musical work.

  2. 2.

    https://musicatradicional.eu.

References

  1. Répertoire International des Sources Musicales (RISM) Series A/II: Music manuscripts after 1600 on CD-ROM. Technical report (2005)

    Google Scholar 

  2. Baró, A., Badal, C., Fornés, A.: Handwritten historical music recognition by sequence-to-sequence with attention mechanism. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 205–210 (2020)

    Google Scholar 

  3. Burgoyne, J.A., Devaney, J., Pugin, L., Fujinaga, I.: Enhanced bleedthrough correction for early music documents with recto-verso registration. In: Bello, J.P., Chew, E., Turnbull, D. (eds.) ISMIR 2008, 9th International Conference on Music Information Retrieval, Drexel University, Philadelphia, PA, USA, 14–18 September 2008, pp. 407–412 (2008)

    Google Scholar 

  4. Burnard, L., Bauman, S. (eds.): A gentle introduction to XML. Text encoding initiative consortium. In: TEI P5: Guidelines for Electronic Text Encoding and Interchange (2007). http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SG.html

  5. Byrd, D., Simonsen, J.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44, 169–195 (2015). https://doi.org/10.1080/09298215.2015.1045424

    Article  Google Scholar 

  6. Calvo-Zaragoza, J., Hajič Jr., J., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4) (2020)

    Google Scholar 

  7. Calvo-Zaragoza, J., Rizo, D.: Camera-PrIMuS: neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, 23–27 September 2018, pp. 248–255 (2018)

    Google Scholar 

  8. Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn. Lett. 128, 115–121 (2019)

    Article  Google Scholar 

  9. Clares Clares, E.: Canción de trilla. Fondo de música tradicional IMF-CSIC. https://musicatradicional.eu/es/piece/12551. Accessed 01 Feb 2021

  10. Dalitz, C., Droettboom, M., Pranzas, B., Fujinaga, I.: A comparative study of staff removal algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 753–766 (2008)

    Article  Google Scholar 

  11. Good, M., Actor, G.: Using MusicXML for file interchange. In: International Conference on Web Delivering of Music 0, 153 (2003)

    Google Scholar 

  12. Graves, A., Fernández, S., Gomez, F.J., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the Twenty-Third International Conference on Machine Learning, (ICML 2006), Pittsburgh, Pennsylvania, USA, 25–29 June 2006, pp. 369–376 (2006)

    Google Scholar 

  13. Hajic, J., Pecina, P.: The MUSCIMA++ dataset for handwritten optical music recognition. In: ICDAR (2017)

    Google Scholar 

  14. Hankinson, A., Roland, P., Fujinaga, I.: The music encoding initiative as a document-encoding framework. In: Proceedings of the 12th International Society for Music Information Retrieval Conference (2011)

    Google Scholar 

  15. Huron, D.: Humdrum and Kern: Selective Feature Encoding, pp. 375–401. MIT Press, Cambridge (1997)

    Google Scholar 

  16. Koehn, P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  17. Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 177–180. Association for Computational Linguistics, Prague (2007)

    Google Scholar 

  18. Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 17–21 September 2015, pp. 1412–1421. The Association for Computational Linguistics (2015)

    Google Scholar 

  19. Pacha, A., Calvo-Zaragoza, J., Hajič jr., J.: Learning notation graph construction for full-pipeline optical music recognition. In: 20th International Society for Music Information Retrieval Conference, pp. 75–82 (2019)

    Google Scholar 

  20. Pacha, A., Eidenberger, H.: Towards a universal music symbol classifier. In: 14th International Conference on Document Analysis and Recognition, pp. 35–36. IAPR TC10 (Technical Committee on Graphics Recognition), IEEE Computer Society, Kyoto (2017)

    Google Scholar 

  21. Pacha, A., Hajič, J., Calvo-Zaragoza, J.: A baseline for general music object detection with deep learning. Appl. Sci. 8(9), 1488 (2018)

    Article  Google Scholar 

  22. Parada-Cabaleiro, E., Batliner, A., Schuller, B.W.: A diplomatic edition of il lauro secco: ground truth for OMR of white mensural notation. In: Flexer, A., Peeters, G., Urbano, J., Volk, A. (eds.) Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019, Delft, The Netherlands, 4–8 November 2019, pp. 557–564 (2019)

    Google Scholar 

  23. Pugin, L., Zitellini, R., Roland, P.: Verovio: a library for engraving MEI music notation into SVG. In: Proceedings of the 15th International Society for Music Information Retrieval Conference, pp. 107–112. ISMIR, October 2014

    Google Scholar 

  24. Quirós, L., Toselli, A.H., Vidal, E.: Multi-task layout analysis of handwritten musical scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 123–134. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_11

    Chapter  Google Scholar 

  25. Ríos-Vila, A., Calvo-Zaragoza, J., Rizo, D.: Evaluating simultaneous recognition and encoding for optical music recognition. In: 7th International Conference on Digital Libraries for Musicology, DLfM 2020, pp. 10–17. Association for Computing Machinery, New York (2020)

    Google Scholar 

  26. Ríos-Vila, A., Esplà-Gomis, M., Rizo, D., Ponce de León, P.J., Iñesta, J.M.: Applying automatic translation for optical music recognition’s encoding step. Appl. Sci. 11(9) (2021)

    Google Scholar 

  27. Sánchez, J., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: A set of benchmarks for handwritten text recognition on historical documents. Pattern Recognit. 94, 122–134 (2019)

    Article  Google Scholar 

  28. Sapp, C.S.: Verovio humdrum viewer. In: Proceedings of Music Encoding Conference (MEC), Tours, France (2017)

    Google Scholar 

  29. Tuggener, L., Elezi, I., Schmidhuber, J., Stadelmann, T.: Deep watershed detector for music object recognition. In: 19th International Society for Music Information Retrieval Conference, Paris, 23–27 September 2018 (2018)

    Google Scholar 

  30. Vaswani, A., et al.: Attention is all you need (2017)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Generalitat Valenciana through project GV/2020/030.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Ríos-Vila .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ríos-Vila, A., Rizo, D., Calvo-Zaragoza, J. (2021). Complete Optical Music Recognition via Agnostic Transcription and Machine Translation. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86334-0_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86333-3

  • Online ISBN: 978-3-030-86334-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics