Skip to main content

End-to-End Optical Music Recognition with Attention Mechanism and Memory Units Optimization

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14426))

Included in the following conference series:

  • 469 Accesses

Abstract

Optical Music Recognition (OMR) is a research field aimed at exploring how computers can read sheet music in music documents. In this paper, we propose an end-to-end OMR model based on memory units optimization and attention mechanisms, named ATTML. Firstly, we replace the original LSTM memory unit with a better Mogrifier LSTM memory unit, which enables the input and hidden states to interact fully and obtain better context-related expressions. Meanwhile, the decoder part is augmented with the ECA attention mechanism, enabling the model to better focus on salient features and patterns present in the input data. We use the existing excellent music datasets, PrIMuS, Doremi, and Deepscores, for joint training. Ablation experiments were conducted in our study with the incorporation of diverse attention mechanisms and memory optimization units. Furthermore, we used the musical score density metric, SnSl, to measure the superiority of our model over others, as well as its performance specifically in dense musical scores. Comparative and ablation experiment results show that the proposed method outperforms previous state-of-the-art methods in terms of accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shatri, E., Fazekas, G.: Optical music recognition: state of the art and major challenges (2020). arXiv:abs/2006.07885

  2. Pacha, A., Calvo-Zaragoza, J., Jan Hajič, J.: Learning notation graph construction for full- pipeline optical music recognition. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, pp. 75–82. ISMIR, Delft, The Netherlands (2019). https://doi.org/10.5281/zenodo.3527744

  3. Dorfer, M., Arzt, A., Widmer, G.: Learning audio-sheet music correspondences for score identification and offline alignment. In: International Society for Music Information Retrieval Conference (2017)

    Google Scholar 

  4. Moss, F.C., Köster, M., Femminis, M., Métrailler, C., Bavaud, F.: Digitizing a 19th-century music theory debate for computational analysis, vol. 2989, pp. 12. 159–170. CEUR Workshop Proceedings (2021). http://infoscience.epfl.ch/record/289818

  5. Géraud, T.: A morphological method for music score staff removal. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 2599–2603 (2014). https://doi.org/10.1109/ICIP.2014.7025526

  6. Edirisooriya, S., Dong, H.W., McAuley, J., Berg-Kirkpatrick, T.: An empirical evaluation of end-to-end polyphonic optical music recognition (2021)

    Google Scholar 

  7. Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

    Google Scholar 

  8. Pacha, A., Eidenberger, H.: Towards a universal music symbol classifier. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 02, pp. 35–36 (2017). https://doi.org/10.1109/ICDAR.2017.265

  9. Raphael, C., Wang, J.: New approaches to optical music recognition. In: Proceedings of the 12th International Society for Music Information Retrieval Conference, pp. 305–310. ISMIR, Miami, United States (2011). https://doi.org/10.5281/zenodo.1414856

  10. Kaliakatsos-Papakostas, M.A., Epitropakis, M.G., Vrahatis, M.N.: Musical composer identification through probabilistic and feedforward neural networks. In: Di Chio, C., Brabazon, A., Di Caro, G.A., Ebner, M., Farooq, M., Fink, A., Grahl, J., Greenfield, G., Machado, P., O’Neill, M., Tarantino, E., Urquhart, N. (eds.) EvoApplications 2010. LNCS, vol. 6025, pp. 411–420. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12242-2_42

    Chapter  Google Scholar 

  11. Ríos-Vila, A., Calvo-Zaragoza, J., Iñesta, J.M.: Exploring the two-dimensional nature of music notation for score recognition with end-to-end approaches. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 193–198 (2020). https://doi.org/10.1109/ICFHR2020.2020.00044

  12. Jorge, C.Z., David, R.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8(4), 606 (2018)

    Article  Google Scholar 

  13. Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 147–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_13

    Chapter  Google Scholar 

  14. Hankinson, A., Roland, P., Fujinaga, I.: The music encoding initiative as a document- encoding framework. In: Proceedings of the 12th International Society for Music Information Retrieval Conference, pp. 293–298. ISMIR, Miami, United States (2011). https://doi.org/10.5281/zenodo.1417609

  15. Kim, S., Lee, H., Park, S., Lee, J., Choi, K.: Deep composer classification using symbolic representation (2020)

    Google Scholar 

  16. Pinheiro, P.H.O., Collobert, R.: Recurrent convolutional neural networks for scene parsing (2013). arXiv:abs/1306.2795

  17. Tsai, T.J., Ji, K.: Composer style classification of piano sheet music images using language model pretraining (2020). arXiv:abs/2007.14587

  18. Ríos-Vila, A., Iñesta, J.M., Calvo-Zaragoza, J.: On the use of transformers for end-to-end optical music recognition. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds.) Pattern Recognition and Image Analysis, pp. 470–481. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-04881-4_37

    Chapter  Google Scholar 

  19. Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models (2017). arXiv:abs/1708.02182

  20. Melis, G., Kociský, T., Blunsom, P.: Mogrifier LSTM (2019). arXiv:abs/1909.01792

  21. Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020). https://doi.org/10.1109/CVPR42600.2020.01155

  22. Tuggener, L., Elezi, I., Schmidhuber, J., Pelillo, M., Stadelmann, T.: Deepscores-a dataset for segmentation, detection and classification of tiny objects. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3704–3709 (2018). https://doi.org/10.1109/ICPR.2018.8545307

  23. Shatri, E., Fazekas, G.: DoReMi: first glance at a universal OMR dataset (2021). arXiv:bs/2107.07786

  24. Liu, A., et al.: Residual recurrent CRNN for end-to-end optical music recognition on monophonic scores. In: Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding, MMPT ’21, pp. 23–27. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3463945.3469056

  25. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745

  26. Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCNet: real-time scene text spotting with adaptive Bezier-curve network. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9806–9815 (2020). https://doi.org/10.1109/CVPR42600.2020.00983

  27. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018. Lecture Notes in Computer Science(), vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1

    Chapter  Google Scholar 

  28. Dai, Y., Gieseke, F., Oehmcke, S., Wu, Y., Barnard, K.: Attentional feature fusion. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3559–3568 (2020)

    Google Scholar 

Download references

Acknowledgements

The paper is supported by the Natural Science Foundation of China (No. 62072388), Collaborative Project fund of Fuzhou-Xiamen-Quanzhou Innovation Zone(No.3502ZCQXT202001), the industry guidance project foundation of science technology bureau of Fujian province in 2020(No.2020 H0047), and Fujian Sunshine Charity Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junfeng Yao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, R., Yao, J. (2024). End-to-End Optical Music Recognition with Attention Mechanism and Memory Units Optimization. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8432-9_32

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8431-2

  • Online ISBN: 978-981-99-8432-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics