Abstract
Optical Music Recognition (OMR) is a research field aimed at exploring how computers can read sheet music in music documents. In this paper, we propose an end-to-end OMR model based on memory units optimization and attention mechanisms, named ATTML. Firstly, we replace the original LSTM memory unit with a better Mogrifier LSTM memory unit, which enables the input and hidden states to interact fully and obtain better context-related expressions. Meanwhile, the decoder part is augmented with the ECA attention mechanism, enabling the model to better focus on salient features and patterns present in the input data. We use the existing excellent music datasets, PrIMuS, Doremi, and Deepscores, for joint training. Ablation experiments were conducted in our study with the incorporation of diverse attention mechanisms and memory optimization units. Furthermore, we used the musical score density metric, SnSl, to measure the superiority of our model over others, as well as its performance specifically in dense musical scores. Comparative and ablation experiment results show that the proposed method outperforms previous state-of-the-art methods in terms of accuracy and robustness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Shatri, E., Fazekas, G.: Optical music recognition: state of the art and major challenges (2020). arXiv:abs/2006.07885
Pacha, A., Calvo-Zaragoza, J., Jan Hajič, J.: Learning notation graph construction for full- pipeline optical music recognition. In: Proceedings of the 20th International Society for Music Information Retrieval Conference, pp. 75–82. ISMIR, Delft, The Netherlands (2019). https://doi.org/10.5281/zenodo.3527744
Dorfer, M., Arzt, A., Widmer, G.: Learning audio-sheet music correspondences for score identification and offline alignment. In: International Society for Music Information Retrieval Conference (2017)
Moss, F.C., Köster, M., Femminis, M., Métrailler, C., Bavaud, F.: Digitizing a 19th-century music theory debate for computational analysis, vol. 2989, pp. 12. 159–170. CEUR Workshop Proceedings (2021). http://infoscience.epfl.ch/record/289818
Géraud, T.: A morphological method for music score staff removal. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 2599–2603 (2014). https://doi.org/10.1109/ICIP.2014.7025526
Edirisooriya, S., Dong, H.W., McAuley, J., Berg-Kirkpatrick, T.: An empirical evaluation of end-to-end polyphonic optical music recognition (2021)
Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Pacha, A., Eidenberger, H.: Towards a universal music symbol classifier. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 02, pp. 35–36 (2017). https://doi.org/10.1109/ICDAR.2017.265
Raphael, C., Wang, J.: New approaches to optical music recognition. In: Proceedings of the 12th International Society for Music Information Retrieval Conference, pp. 305–310. ISMIR, Miami, United States (2011). https://doi.org/10.5281/zenodo.1414856
Kaliakatsos-Papakostas, M.A., Epitropakis, M.G., Vrahatis, M.N.: Musical composer identification through probabilistic and feedforward neural networks. In: Di Chio, C., Brabazon, A., Di Caro, G.A., Ebner, M., Farooq, M., Fink, A., Grahl, J., Greenfield, G., Machado, P., O’Neill, M., Tarantino, E., Urquhart, N. (eds.) EvoApplications 2010. LNCS, vol. 6025, pp. 411–420. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12242-2_42
Ríos-Vila, A., Calvo-Zaragoza, J., Iñesta, J.M.: Exploring the two-dimensional nature of music notation for score recognition with end-to-end approaches. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 193–198 (2020). https://doi.org/10.1109/ICFHR2020.2020.00044
Jorge, C.Z., David, R.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8(4), 606 (2018)
Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 147–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_13
Hankinson, A., Roland, P., Fujinaga, I.: The music encoding initiative as a document- encoding framework. In: Proceedings of the 12th International Society for Music Information Retrieval Conference, pp. 293–298. ISMIR, Miami, United States (2011). https://doi.org/10.5281/zenodo.1417609
Kim, S., Lee, H., Park, S., Lee, J., Choi, K.: Deep composer classification using symbolic representation (2020)
Pinheiro, P.H.O., Collobert, R.: Recurrent convolutional neural networks for scene parsing (2013). arXiv:abs/1306.2795
Tsai, T.J., Ji, K.: Composer style classification of piano sheet music images using language model pretraining (2020). arXiv:abs/2007.14587
Ríos-Vila, A., Iñesta, J.M., Calvo-Zaragoza, J.: On the use of transformers for end-to-end optical music recognition. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds.) Pattern Recognition and Image Analysis, pp. 470–481. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-04881-4_37
Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models (2017). arXiv:abs/1708.02182
Melis, G., Kociský, T., Blunsom, P.: Mogrifier LSTM (2019). arXiv:abs/1909.01792
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020). https://doi.org/10.1109/CVPR42600.2020.01155
Tuggener, L., Elezi, I., Schmidhuber, J., Pelillo, M., Stadelmann, T.: Deepscores-a dataset for segmentation, detection and classification of tiny objects. In: 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3704–3709 (2018). https://doi.org/10.1109/ICPR.2018.8545307
Shatri, E., Fazekas, G.: DoReMi: first glance at a universal OMR dataset (2021). arXiv:bs/2107.07786
Liu, A., et al.: Residual recurrent CRNN for end-to-end optical music recognition on monophonic scores. In: Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding, MMPT ’21, pp. 23–27. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3463945.3469056
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018). https://doi.org/10.1109/CVPR.2018.00745
Liu, Y., Chen, H., Shen, C., He, T., Jin, L., Wang, L.: ABCNet: real-time scene text spotting with adaptive Bezier-curve network. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9806–9815 (2020). https://doi.org/10.1109/CVPR42600.2020.00983
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision - ECCV 2018. Lecture Notes in Computer Science(), vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Dai, Y., Gieseke, F., Oehmcke, S., Wu, Y., Barnard, K.: Attentional feature fusion. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3559–3568 (2020)
Acknowledgements
The paper is supported by the Natural Science Foundation of China (No. 62072388), Collaborative Project fund of Fuzhou-Xiamen-Quanzhou Innovation Zone(No.3502ZCQXT202001), the industry guidance project foundation of science technology bureau of Fujian province in 2020(No.2020 H0047), and Fujian Sunshine Charity Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
He, R., Yao, J. (2024). End-to-End Optical Music Recognition with Attention Mechanism and Memory Units Optimization. In: Liu, Q., et al. Pattern Recognition and Computer Vision. PRCV 2023. Lecture Notes in Computer Science, vol 14426. Springer, Singapore. https://doi.org/10.1007/978-981-99-8432-9_32
Download citation
DOI: https://doi.org/10.1007/978-981-99-8432-9_32
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8431-2
Online ISBN: 978-981-99-8432-9
eBook Packages: Computer ScienceComputer Science (R0)