End-to-End Optical Music Recognition with Attention Mechanism and Memory Units Optimization

He, Ruichen; Yao, Junfeng

doi:10.1007/978-981-99-8432-9_32

Ruichen He¹⁵ &
Junfeng Yao^15,16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14426))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

469 Accesses

Abstract

Optical Music Recognition (OMR) is a research field aimed at exploring how computers can read sheet music in music documents. In this paper, we propose an end-to-end OMR model based on memory units optimization and attention mechanisms, named ATTML. Firstly, we replace the original LSTM memory unit with a better Mogrifier LSTM memory unit, which enables the input and hidden states to interact fully and obtain better context-related expressions. Meanwhile, the decoder part is augmented with the ECA attention mechanism, enabling the model to better focus on salient features and patterns present in the input data. We use the existing excellent music datasets, PrIMuS, Doremi, and Deepscores, for joint training. Ablation experiments were conducted in our study with the incorporation of diverse attention mechanisms and memory optimization units. Furthermore, we used the musical score density metric, SnSl, to measure the superiority of our model over others, as well as its performance specifically in dense musical scores. Comparative and ablation experiment results show that the proposed method outperforms previous state-of-the-art methods in terms of accuracy and robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shatri, E., Fazekas, G.: Optical music recognition: state of the art and major challenges (2020). arXiv:abs/2006.07885
Pacha, A., Calvo-Zaragoza, J., Jan Hajič, J.: Learning notation graph construction for full- pipeline optical music recognition. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, pp. 75–82. ISMIR, Delft, The Netherlands (2019). https://doi.org/10.5281/zenodo.3527744
Dorfer, M., Arzt, A., Widmer, G.: Learning audio-sheet music correspondences for score identification and offline alignment. In: International Society for Music Information Retrieval Conference (2017)
Google Scholar
Moss, F.C., Köster, M., Femminis, M., Métrailler, C., Bavaud, F.: Digitizing a 19th-century music theory debate for computational analysis, vol. 2989, pp. 12. 159–170. CEUR Workshop Proceedings (2021). http://infoscience.epfl.ch/record/289818
Géraud, T.: A morphological method for music score staff removal. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 2599–2603 (2014). https://doi.org/10.1109/ICIP.2014.7025526
Edirisooriya, S., Dong, H.W., McAuley, J., Berg-Kirkpatrick, T.: An empirical evaluation of end-to-end polyphonic optical music recognition (2021)
Google Scholar
Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Pacha, A., Eidenberger, H.: Towards a universal music symbol classifier. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 02, pp. 35–36 (2017). https://doi.org/10.1109/ICDAR.2017.265
Raphael, C., Wang, J.: New approaches to optical music recognition. In: Proceedings of the 12th International Society for Music Information Retrieval Conference, pp. 305–310. ISMIR, Miami, United States (2011). https://doi.org/10.5281/zenodo.1414856
Kaliakatsos-Papakostas, M.A., Epitropakis, M.G., Vrahatis, M.N.: Musical composer identification through probabilistic and feedforward neural networks. In: Di Chio, C., Brabazon, A., Di Caro, G.A., Ebner, M., Farooq, M., Fink, A., Grahl, J., Greenfield, G., Machado, P., O’Neill, M., Tarantino, E., Urquhart, N. (eds.) EvoApplications 2010. LNCS, vol. 6025, pp. 411–420. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12242-2_42
Chapter Google Scholar
Ríos-Vila, A., Calvo-Zaragoza, J., Iñesta, J.M.: Exploring the two-dimensional nature of music notation for score recognition with end-to-end approaches. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 193–198 (2020). https://doi.org/10.1109/ICFHR2020.2020.00044
Jorge, C.Z., David, R.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8(4), 606 (2018)
Article Google Scholar
Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 147–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_13
Chapter Google Scholar
Hankinson, A., Roland, P., Fujinaga, I.: The music encoding initiative as a document- encoding framework. In: Proceedings of the 12th International Society for Music Information Retrieval Conference, pp. 293–298. ISMIR, Miami, United States (2011). https://doi.org/10.5281/zenodo.1417609
Kim, S., Lee, H., Park, S., Lee, J., Choi, K.: Deep composer classification using symbolic representation (2020)
Google Scholar
Pinheiro, P.H.O., Collobert, R.: Recurrent convolutional neural networks for scene parsing (2013). arXiv:abs/1306.2795
Tsai, T.J., Ji, K.: Composer style classification of piano sheet music images using language model pretraining (2020). arXiv:abs/2007.14587
Ríos-Vila, A., Iñesta, J.M., Calvo-Zaragoza, J.: On the use of transformers for end-to-end optical music recognition. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds.) Pattern Recognition and Image Analysis, pp. 470–481. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-04881-4_37
Chapter Google Scholar
Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models (2017). arXiv:abs/1708.02182
Melis, G., Kociský, T., Blunsom, P.: Mogrifier LSTM (2019). arXiv:abs/1909.01792
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020). https://doi.org/10.1109/CVPR42600.2020.01155
Tuggener, L., Elezi, I., Schmidhuber, J., Pelillo, M., Stadelmann, T.: Deepscores-a dataset for segmentation, detection and classification of tiny objects. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3704–3709 (2018). https://doi.org/10.1109/ICPR.2018.8545307
Shatri, E., Fazekas, G.: DoReMi: first glance at a universal OMR dataset (2021). arXiv:bs/2107.07786
Liu, A., et al.: Residual recurrent CRNN for end-to-end optical music recognition on monophonic scores. In: Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding, MMPT ’21, pp. 23–27. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3463945.3469056
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCNet: real-time scene text spotting with adaptive Bezier-curve network. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9806–9815 (2020). https://doi.org/10.1109/CVPR42600.2020.00983
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018. Lecture Notes in Computer Science(), vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Dai, Y., Gieseke, F., Oehmcke, S., Wu, Y., Barnard, K.: Attentional feature fusion. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3559–3568 (2020)
Google Scholar

Download references

Acknowledgements

The paper is supported by the Natural Science Foundation of China (No. 62072388), Collaborative Project fund of Fuzhou-Xiamen-Quanzhou Innovation Zone(No.3502ZCQXT202001), the industry guidance project foundation of science technology bureau of Fujian province in 2020(No.2020 H0047), and Fujian Sunshine Charity Foundation.

Author information

Authors and Affiliations

Center for Digital Media Computing, School of Film, School of Informatics, Xiamen University, Xiamen, 361005, China
Ruichen He & Junfeng Yao
Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan, Ministry of Culture and Tourism, Xiamen, China
Junfeng Yao
Institute of Artificial Intelligence, Xiamen University, Xiamen, 361005, China
Junfeng Yao

Authors

Ruichen He
View author publications
You can also search for this author in PubMed Google Scholar
Junfeng Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junfeng Yao .

Editor information

Editors and Affiliations

Nanjing University of Information Science and Technology, Nanjing, China
Qingshan Liu
Xiamen University, Xiamen, China
Hanzi Wang
Beijing University of Posts and Telecommunications, Beijing, China
Zhanyu Ma
Sun Yat-sen University, Guangzhou, China
Weishi Zheng
Peking University, Beijing, China
Hongbin Zha
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Chinese Academy of Sciences, Beijing, China
Liang Wang
Xiamen University, Xiamen, China
Rongrong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, R., Yao, J. (2024). End-to-End Optical Music Recognition with Attention Mechanism and Memory Units Optimization. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_32

Download citation

DOI: https://doi.org/10.1007/978-981-99-8432-9_32
Published: 24 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8431-2
Online ISBN: 978-981-99-8432-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

End-to-End Optical Music Recognition with Attention Mechanism and Memory Units Optimization