skip to main content
10.1145/3573128.3604895acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Synchronous Recognition of Music Images Using Coupled N-Gram Models

Published: 22 August 2023 Publication History

Abstract

Handwritten music recognition researches the use of technologies to automatically transcribe handwritten music pieces that are only found in image format, and make them available to the general public. Many historical music pieces are composed by a music part and a lyrics part. Handwritten music recognition has focused mainly on transcribing the music elements in historical images, but there exist many pieces where both music and lyrics are present and of relevance. The recognition of both music and lyrics is generally carried out as separate tasks. Both parts are synchronized in many historical documents at line level and loosely at word level. These two elements are strongly related having each one affecting the other. Discovering this relation may be very relevant to improve recognition results in both parts and to further steps like music analysis, composition analysis, etc. This paper introduces a preliminary system that transcribes synchronously and simultaneously both the music and lyrics elements of handwritten historical music images. The results obtained over a historical manuscript dataset show that this system obtains an improvement of up to 15.4% at symbol rate on stave recognition and up to an approximately average 7.6% improvement when both the music and lyrics part are jointly considered.

References

[1]
Arnau Baró, Pau Riba, Jorge Calvo-Zaragoza, and Alicia Fornés. 2019. From Optical Music Recognition to Handwritten Music Recognition: A baseline. Pattern Recognition Letters 123 (2019), 1--8. https://doi.org/10.1016/j.patrec.2019.02.029
[2]
Matthew Brand. 1997. Coupled hidden Markov models for modeling interacting processes.
[3]
M. Brand, N. Oliver, and A. Pentland. 1997. Coupled hidden Markov models for complex action recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 994--999. https://doi.org/10.1109/CVPR.1997.609450
[4]
P.F. Brown, J. Cocke, S.A. Della Pietra, V.J. Della Pietra, F. Jelinek, J.D. Lafferty, R.L. Mercer, and P.S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics 16, 2 (1990), 79--85.
[5]
Jorge Calvo-Zaragoza, Isabel Barbancho, Lorenzo Tardon, and Ana Barbancho. 2014. Avoiding staff removal stage in optical music recognition: application to scores written in white mensural notation. Formal Pattern Analysis & Applications 18 (09 2014). https://doi.org/10.1007/s10044-014-0415-5
[6]
Jorge Calvo-Zaragoza, Alejandro H. Toselli, and Enrique Vidal. 2016. Early Handwritten Music Recognition with Hidden Markov Models. In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). 319--324. https://doi.org/10.1109/ICFHR.2016.0067
[7]
Jorge Calvo-Zaragoza, Alejandro H. Toselli, and Enrique Vidal. 2019. Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks. Pattern Recognition Letters 128 (2019), 115--121. https://doi.org/10.1016/j.patrec.2019.08.021
[8]
Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. A Simple, Fast, and Effective Reparameterization of IBM Model 2. In North American Chapter of the Association for Computational Linguistics.
[9]
Yarin Gal and Zoubin Ghahramani. 2016. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (NIPS'16). 1027--1035.
[10]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In Proceedings of the 23rd International Conference on Machine Learning (ICML '06). 369--376. https://doi.org/10.1145/1143844.1143891
[11]
A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber. 2009. A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 5 (2009), 855--868.
[12]
Hecht-Nielsen. 1989. Theory of the backpropagation neural network. In International 1989 Joint Conference on Neural Networks. 593--605 vol.1. https://doi.org/10.1109/IJCNN.1989.118638
[13]
Ara Nefian, Luhong Liang, Xiaobo Pi, Liu Xiaoxiang, Crusoe Mao, and Kevin Murphy. 2002. A coupled HMM for audio-visual speech recognition. 2 (01 2002). https://doi.org/10.1109/ICASSP.2002.5745027
[14]
Lorenzo Quirós, Enrique Vidal, Joan Andreu Sánchez, and Manuel Villarreal. 2021. Vorau Abbey library Cod. 253 dataset for Document Layout Analysis. https://doi.org/10.5281/zenodo.5443258
[15]
Ana Rebelo, Ichiro Fujinaga, Filipe Paszkiewicz, André Marçal, Carlos Guedes, and Jaime Cardoso. 2012. Optical music recognition: State-of-the-art and open issues. International Journal of Multimedia Information Retrieval 1 (10 2012). https://doi.org/10.1007/s13735-012-0004-6
[16]
Pau Torras, Arnau Baró, Lei Kang, and Alicia Fornés. 2021. On the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition. In Proceedings of the 22nd International Society for Music Information Retrieval Conference. 690--696. https://doi.org/10.5281/zenodo.5624451
[17]
Eelco van der Wel and Karen Ullrich. 2017. Optical Music Recognition with Convolutional Sequence-to-Sequence Models. arXiv:1707.04877 [cs.CV]
[18]
Manuel Villarreal and Joan Andreu Sánchez. 2020. Handwritten Music Recognition Improvement through Language Model Re-interpretation for Mensural Notation. In 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR). 199--204. https://doi.org/10.1109/ICFHR2020.2020.00045
[19]
Ronald J. Williams and David Zipser. 1995. Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity. 433--486.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '23: Proceedings of the ACM Symposium on Document Engineering 2023
August 2023
187 pages
ISBN:9798400700279
DOI:10.1145/3573128
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2023
Accepted: 04 June 2023
Revised: 04 June 2023
Received: 01 May 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Coupled Models
  2. Handwritten music recognition
  3. N-gram language models
  4. Synchronous recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Generalitat Valenciana
  • MCIN/AEI/10.13039/501100011033

Conference

DocEng '23
Sponsor:
DocEng '23: ACM Symposium on Document Engineering 2023
August 22 - 25, 2023
Limerick, Ireland

Acceptance Rates

DocEng '23 Paper Acceptance Rate 9 of 27 submissions, 33%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 48
    Total Downloads
  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media