research-article

Synchronous Recognition of Music Images Using Coupled N-Gram Models

Authors:

Manuel Villarreal,

Joan Andreu SánchezAuthors Info & Claims

DocEng '23: Proceedings of the ACM Symposium on Document Engineering 2023

Article No.: 31, Pages 1 - 9

https://doi.org/10.1145/3573128.3604895

Published: 22 August 2023 Publication History

Abstract

Handwritten music recognition researches the use of technologies to automatically transcribe handwritten music pieces that are only found in image format, and make them available to the general public. Many historical music pieces are composed by a music part and a lyrics part. Handwritten music recognition has focused mainly on transcribing the music elements in historical images, but there exist many pieces where both music and lyrics are present and of relevance. The recognition of both music and lyrics is generally carried out as separate tasks. Both parts are synchronized in many historical documents at line level and loosely at word level. These two elements are strongly related having each one affecting the other. Discovering this relation may be very relevant to improve recognition results in both parts and to further steps like music analysis, composition analysis, etc. This paper introduces a preliminary system that transcribes synchronously and simultaneously both the music and lyrics elements of handwritten historical music images. The results obtained over a historical manuscript dataset show that this system obtains an improvement of up to 15.4% at symbol rate on stave recognition and up to an approximately average 7.6% improvement when both the music and lyrics part are jointly considered.

References

[1]

Arnau Baró, Pau Riba, Jorge Calvo-Zaragoza, and Alicia Fornés. 2019. From Optical Music Recognition to Handwritten Music Recognition: A baseline. Pattern Recognition Letters 123 (2019), 1--8. https://doi.org/10.1016/j.patrec.2019.02.029

Digital Library

[2]

Matthew Brand. 1997. Coupled hidden Markov models for modeling interacting processes.

[3]

M. Brand, N. Oliver, and A. Pentland. 1997. Coupled hidden Markov models for complex action recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 994--999. https://doi.org/10.1109/CVPR.1997.609450

[4]

P.F. Brown, J. Cocke, S.A. Della Pietra, V.J. Della Pietra, F. Jelinek, J.D. Lafferty, R.L. Mercer, and P.S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics 16, 2 (1990), 79--85.

Digital Library

[5]

Jorge Calvo-Zaragoza, Isabel Barbancho, Lorenzo Tardon, and Ana Barbancho. 2014. Avoiding staff removal stage in optical music recognition: application to scores written in white mensural notation. Formal Pattern Analysis & Applications 18 (09 2014). https://doi.org/10.1007/s10044-014-0415-5

Digital Library

[6]

Jorge Calvo-Zaragoza, Alejandro H. Toselli, and Enrique Vidal. 2016. Early Handwritten Music Recognition with Hidden Markov Models. In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). 319--324. https://doi.org/10.1109/ICFHR.2016.0067

[7]

Jorge Calvo-Zaragoza, Alejandro H. Toselli, and Enrique Vidal. 2019. Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks. Pattern Recognition Letters 128 (2019), 115--121. https://doi.org/10.1016/j.patrec.2019.08.021

Digital Library

[8]

Chris Dyer, Victor Chahuneau, and Noah A. Smith. 2013. A Simple, Fast, and Effective Reparameterization of IBM Model 2. In North American Chapter of the Association for Computational Linguistics.

[9]

Yarin Gal and Zoubin Ghahramani. 2016. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks (NIPS'16). 1027--1035.

[10]

Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. In Proceedings of the 23rd International Conference on Machine Learning (ICML '06). 369--376. https://doi.org/10.1145/1143844.1143891

Digital Library

[11]

A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, and J. Schmidhuber. 2009. A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 5 (2009), 855--868.

Digital Library

[12]

Hecht-Nielsen. 1989. Theory of the backpropagation neural network. In International 1989 Joint Conference on Neural Networks. 593--605 vol.1. https://doi.org/10.1109/IJCNN.1989.118638

[13]

Ara Nefian, Luhong Liang, Xiaobo Pi, Liu Xiaoxiang, Crusoe Mao, and Kevin Murphy. 2002. A coupled HMM for audio-visual speech recognition. 2 (01 2002). https://doi.org/10.1109/ICASSP.2002.5745027

[14]

Lorenzo Quirós, Enrique Vidal, Joan Andreu Sánchez, and Manuel Villarreal. 2021. Vorau Abbey library Cod. 253 dataset for Document Layout Analysis. https://doi.org/10.5281/zenodo.5443258

[15]

Ana Rebelo, Ichiro Fujinaga, Filipe Paszkiewicz, André Marçal, Carlos Guedes, and Jaime Cardoso. 2012. Optical music recognition: State-of-the-art and open issues. International Journal of Multimedia Information Retrieval 1 (10 2012). https://doi.org/10.1007/s13735-012-0004-6

[16]

Pau Torras, Arnau Baró, Lei Kang, and Alicia Fornés. 2021. On the Integration of Language Models into Sequence to Sequence Architectures for Handwritten Music Recognition. In Proceedings of the 22nd International Society for Music Information Retrieval Conference. 690--696. https://doi.org/10.5281/zenodo.5624451

[17]

Eelco van der Wel and Karen Ullrich. 2017. Optical Music Recognition with Convolutional Sequence-to-Sequence Models. arXiv:1707.04877 [cs.CV]

[18]

Manuel Villarreal and Joan Andreu Sánchez. 2020. Handwritten Music Recognition Improvement through Language Model Re-interpretation for Mensural Notation. In 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR). 199--204. https://doi.org/10.1109/ICFHR2020.2020.00045

[19]

Ronald J. Williams and David Zipser. 1995. Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity. 433--486.

Index Terms

Synchronous Recognition of Music Images Using Coupled N-Gram Models

Recommendations

From Optical Music Recognition to Handwritten Music Recognition: A baseline
Highlights
- A complete Optical Music Recognition (OMR) system for handwritten scores.
- A ...
Graphical abstract

Display Omitted

Abstract
Optical Music Recognition (OMR) is the branch of document image analysis that aims to convert images of musical scores into a computer-readable format. Despite decades of research, the recognition of handwritten music scores, ...
Handwritten Music Recognition for Mensural notation with convolutional recurrent neural networks
Highlights
- Neural approach for Handwritten Music Recognition in Mensural notation.
- ...
Abstract
Optical Music Recognition is the technology that allows computers to read music notation, which is also referred to as Handwritten Music Recognition when it is applied over handwritten notation. This technology aims at efficiently ...
Hybrid hidden Markov models and artificial neural networks for handwritten music recognition in mensural notation
Abstract
In this paper, we present a hybrid approach using hidden Markov models (HMM) and artificial neural networks to deal with the task of handwritten Music Recognition in mensural notation. Previous works have shown that the task can be addressed with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DocEng '23: Proceedings of the ACM Symposium on Document Engineering 2023

August 2023

187 pages

ISBN:9798400700279

DOI:10.1145/3573128

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2023

Accepted: 04 June 2023

Revised: 04 June 2023

Received: 01 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Generalitat Valenciana
MCIN/AEI/10.13039/501100011033

Conference

DocEng '23

Sponsor:

SIGWEB

DocEng '23: ACM Symposium on Document Engineering 2023

August 22 - 25, 2023

Limerick, Ireland

Acceptance Rates

DocEng '23 Paper Acceptance Rate 9 of 27 submissions, 33%;

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
48
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten