Skip to main content

Musigraph: Optical Music Recognition Through Object Detection and Graph Neural Network

  • Conference paper
  • First Online:
Frontiers in Handwriting Recognition (ICFHR 2022)

Abstract

During the last decades, the performance of optical music recognition has been increasingly improving. However, and despite the 2-dimensional nature of music notation (e.g. notes have rhythm and pitch), most works treat musical scores as a sequence of symbols in one dimension, which make their recognition still a challenge. Thus, in this work we explore the use of graph neural networks for musical score recognition. First, because graphs are suited for n-dimensional representations, and second, because the combination of graphs with deep learning has shown a great performance in similar applications. Our methodology consists of: First, we will detect each isolated/atomic symbols (those that can not be decomposed in more graphical primitives) and the primitives that form a musical symbol. Then, we will build the graph taking as root node the notehead and as leaves those primitives or symbols that modify the note’s rhythm (stem, beam, flag) or pitch (flat, sharp, natural). Finally, the graph is translated into a human-readable character sequence for a final transcription and evaluation. Our method has been tested on more than five thousand measures, showing promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.cvc.uab.es/people/abaro/ in Datasets.

  2. 2.

    https://lilypond.org.

References

  1. Baró, A., Badal, C., Fornés, A.: Handwritten historical music recognition by sequence-to-sequence with attention mechanism. In: International Conference on Frontiers in Handwriting Recognition, pp. 205–210 (2020)

    Google Scholar 

  2. Baró, A., Riba, P., Fornés, A.: Towards the recognition of compound music notes in handwritten music scores. In: International Conference on Frontiers in Handwriting Recognition, pp. 465–470 (2016)

    Google Scholar 

  3. Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A.: From optical music recognition to handwritten music recognition: a baseline. Pattern Recogn. Lett. 123, 1–8 (2019)

    Article  Google Scholar 

  4. Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS – improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5561–5569 (2017)

    Google Scholar 

  5. Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)

  6. Calvo-Zaragoza, J., Pertusa, A., Oncina, J.: Staff-line detection and removal using a convolutional neural network. Mach. Vis. Appl. 28(5–6), 665–674 (2017)

    Article  Google Scholar 

  7. Calvo-Zaragoza, J., Rizo, D.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8, 1–23 (2018)

    Article  Google Scholar 

  8. Coüasnon, B., Rétif, B.: Using a grammar for a reliable full score recognition system (1995)

    Google Scholar 

  9. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: Advances in Neural Information Processing Systems, pp. 3844–3852 (2016)

    Google Scholar 

  10. Escalera, S., Fornés, A., Pujol, O., Radeva, P., Sánchez, G., Lladós, J.: Blurred shape model for binary and grey-level symbol recognition. Pattern Recogn. Lett. 30(15), 1424–1433 (2009)

    Article  Google Scholar 

  11. Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E.: Neural message passing for quantum chemistry. In: Proceedings of the International Conference on Machine Learning, pp. 1263–1272 (2017)

    Google Scholar 

  12. Gori, M., Monfardini, G., Scarselli, F.: A new model for learning in graph domains. In: IEEE International Joint Conference on Neural Networks, vol. 2, pp. 729–734 (2005)

    Google Scholar 

  13. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30, pp. 1024–1034 (2017)

    Google Scholar 

  14. Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015)

  15. Hajič, J., Pecina, P.: The MUSCIMA++ dataset for handwritten optical music recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 39–46 (2017)

    Google Scholar 

  16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)

    Google Scholar 

  17. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proceedings of the International Conference on Learning Representations (2017)

    Google Scholar 

  18. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  19. Pacha, A., et al.: Handwritten music object detection: open issues and baseline results. In: International Workshop on Document Analysis Systems, pp. 163–168 (2018)

    Google Scholar 

  20. Padilla, R., Netto, S.L., da Silva, E.A.B.: A survey on performance metrics for object-detection algorithms. In: International Conference on Systems, Signals and Image Processing, pp. 237–242 (2020)

    Google Scholar 

  21. Padilla, R., Passos, W.L., Dias, T.L.B., Netto, S.L., da Silva, E.A.B.: A comparative analysis of object detection metrics with a companion open-source toolkit. Electronics 10(3), 279 (2021)

    Article  Google Scholar 

  22. Pugin, L.: Optical music recognition of early typographic prints using hidden Markov models. In: International Society for Music Information Retrieval, pp. 53–56 (2006)

    Google Scholar 

  23. Pugin, L., Burgoyne, J.A., Fujinaga, I.: Map adaptation to improve optical music recognition of early music documents using hidden Markov models. In: International Society for Music Information Retrieval, pp. 513–516 (2007)

    Google Scholar 

  24. Rebelo, A., Capela, G., Cardoso, J.S.: Optical recognition of music symbols: a comparative study. Int. J. Doc. Anal. Recogn. 13(1), 19–31 (2010)

    Article  Google Scholar 

  25. Satorras, V.G., Estrach, J.B.: Few-shot learning with graph neural networks. In: International Conference on Learning Representations (2018)

    Google Scholar 

  26. Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2009)

    Article  Google Scholar 

  27. Torras, P., Baró, A., Kang, L., Fornés, A.: On the integration of language models into sequence to sequence architectures for handwritten music recognition. In: International Society for Music Information Retrieval, pp. 690–696 (2021)

    Google Scholar 

  28. Tuggener, L., Elezi, I., Schmidhuber, J., Stadelmann, T.: Deep watershed detector for music object recognition. In: International Society for Music Information Retrieval, pp. 271–278 (2018)

    Google Scholar 

  29. Tuggener, L., Satyawan, Y.P., Pacha, A., Schmidhuber, J., Stadelmann, T.: The DeepScoresV2 dataset and benchmark for music object detection. In: International Conference on Pattern Recognition, pp. 9188–9195 (2021)

    Google Scholar 

  30. van der Wel, E., Ullrich, K.: Optical music recognition with convolutional sequence-to-sequence models. In: International Society for Music Information Retrieval, pp. 731–737 (2017)

    Google Scholar 

  31. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2

Download references

Acknowledgment

This work has been partially supported by the Spanish projects RTI2018-095645-B-C21 and PID2021-126808OB-I00, and the CERCA Program/Generalitat de Catalunya. The FI fellowship AGAUR 2020 FI_B2 00149 (with the support of the Secretaria d’Universitats i Recerca of the Generalitat de Catalunya and the Fons Social Europeu). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnau Baró .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Baró, A., Riba, P., Fornés, A. (2022). Musigraph: Optical Music Recognition Through Object Detection and Graph Neural Network. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21648-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21647-3

  • Online ISBN: 978-3-031-21648-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics