Enhancing Recognition of Historical Musical Pieces with Synthetic and Composed Images

Villarreal, Manuel; Sánchez, Joan Andreu

doi:10.1007/978-3-031-70543-4_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14806))

Included in the following conference series:

International Conference on Document Analysis and Recognition

390 Accesses

Abstract

Handwritten Music Recognition (HMR) poses the problem of transcribing historical musical pieces from digital image to text. The vast number of untranscribed pieces, together with the scarcity of manually annotated training data renders the manual transcription impractical. Historical musical pieces of particular interest are those dating back to the XVth century and earlier, available only in their original manuscripts. Current state-of-the-art approaches leverage Convolutional and Recurrent Neural Networks (CRNN) due to their effectiveness in processing information without relying on extensive datasets. This paper addresses the data scarcity challenge in HMR by proposing two approaches. Firstly, the utilization of synthetic images to augment the training data, leveraging its successful applications in Handwritten Text Recognition (HTR). Secondly, the paper advocates for image composition, combining the images from a manuscript page to mitigate the contextual limitations associated with single-line processing. Despite challenges observed in traditional HTR models, the regularity found in historical musical document layouts enhances the suitability of image composition for HMR. These approaches allow us to develop a system that can take advantage of additional samples and contextual information to improve the recognition capabilities of the HMR models. Results obtained show a relative improvement when working with synthetic images and a substantial improvement when image composition is considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optical music recognition for homophonic scores with neural networks and synthetic music generation

Article Open access 26 May 2023

Handwritten text recognition and information extraction from ancient manuscripts using deep convolutional and recurrent neural network

Article 13 September 2024

Artistic Media Stylization and Identification Using Convolution Neural Networks

Notes

References

Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A.: From optical music recognition to handwritten music recognition: A baseline. Pattern Recogn. Lett. 123, 1–8 (2019). https://doi.org/10.1016/j.patrec.2019.02.029
Article MATH Google Scholar
Burgoyne, J., Pugin, L., Eustace, G., Fujinaga, I.: A comparative survey of image binarisation algorithms for optical recognition on degraded musical sources. pp. 509–512 (01 2007)
Google Scholar
Calvo-Zaragoza, J., Barbancho, I., Tardon, L., Barbancho, A.: Avoiding staff removal stage in optical music recognition: application to scores written in white mensural notation. Formal Pattern Anal. Appl. 18 (09 2014).https://doi.org/10.1007/s10044-014-0415-5
Calvo-Zaragoza, J., Pacha, A., Shatri, E.: Proceedings of the 5th international workshop on reading music systems (2023)
Google Scholar
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Early handwritten music recognition with hidden markov models. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 319–324 (2016).https://doi.org/10.1109/ICFHR.2016.0067
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn. Lett. 128, 115–121 (2019). https://doi.org/10.1016/j.patrec.2019.08.021
Article MATH Google Scholar
Cardoso, J., Capela, A., Rebelo, A., Guedes, C., Costa, J.: Staff detection with stable paths. IEEE transactions on pattern analysis and machine intelligence 31, 1134–9 (07 2009).https://doi.org/10.1109/TPAMI.2009.34
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–394 (1999). https://doi.org/10.1006/csla.1999.0128, https://www.sciencedirect.com/science/article/pii/S0885230899901286
Coquenet, D., Chatelain, C., Paquet, T.: Dan: a segmentation-free document attention network for handwritten document recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(7), 8227–8243 (2023). https://doi.org/10.1109/TPAMI.2023.3235826
Article MATH Google Scholar
Fornés, A., Lladós, J., Sánchez, G.: Old handwritten musical symbol classification by a dynamic time warping based method, vol. 5046, pp. 51–60, September 2007.https://doi.org/10.1007/978-3-540-88188-9_6
Fujinaga, I.: Salzinnes, CDN-Hsmu M2149.L4, Halifax (Canada), St. Mary’s University - Patrick Power Library, M2149.L4 1554
Google Scholar
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: NIPS’16, pp. 1027–1035 (2016)
Google Scholar
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pp. 369–376 (2006). https://doi.org/10.1145/1143844.1143891
Hecht-Nielsen: Theory of the backpropagation neural network. In: International 1989 Joint Conference on Neural Networks, pp. 593–605, vol. 1 (1989).https://doi.org/10.1109/IJCNN.1989.118638
Henderson, R.: Solmization Syllables in Musical Theory, 1100 to 1600. Columbia University (1969). https://books.google.es/books?id=fiufSgAACAAJ
Kang, L., Riba, P., Wang, Y., Rusiñol, M., Fornés, A., Villegas, M.: Ganwriting: Content-conditioned generation of styled handwritten word images (2020)
Google Scholar
Kim, G., et al.: Donut: document understanding transformer without ocr. arXiv preprint arXiv:2111.15664, 7, 15 (2021)
Martinez Sevilla, J., Ríos Vila, A., Castellanos, F., Calvo-Zaragoza, J.: A Holistic Approach for Aligned Music and Lyrics Transcription, pp. 185–201, August 2023. https://doi.org/10.1007/978-3-031-41676-7_11
Nikolaidou, K., et al.: Wordstylist: Styled verbatim handwritten text generation with latent diffusion models (2023)
Google Scholar
Pondenkandath, V., Alberti, M., Diatta, M., Ingold, R., Liwicki, M.: Historical document synthesis with generative adversarial networks. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 146–151 (2019).https://doi.org/10.1109/ICDARW.2019.40096
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)
Google Scholar
Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marçal, A., Guedes, C., Cardoso, J.: Optical music recognition: state-of-the-art and open issues. Int. J. Multimed. Inf. Retrieval 1, October 2012. https://doi.org/10.1007/s13735-012-0004-6
Ríos-Vila, A., Iñesta, J.M., Calvo-Zaragoza, J.: End-to-end full-page optical music recognition for mensural notation. In: Ismir 2022 Hybrid Conference (2022)
Google Scholar
Ríos-Vila, A., Iñesta, J.M., Calvo-Zaragoza, J.: On the use of transformers for end-to-end optical music recognition. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds.) Pattern Recognition and Image Analysis, pp. 470–481. Springer, Cham (2022)
Chapter Google Scholar
Rossant, F., Isabelle, B.: Robust and adaptive omr system including fuzzy modeling, fusion of musical rules, and possible error detection. EURASIP J. Adv. Signal Process. 2007 (01 2007). https://doi.org/10.1155/2007/81541
Tanha, J., Does, J., Depuydt, K., Sánchez, J.: Crossing the lines: making optimal use of context in line-based handwritten text recognition. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 956–960 (2015). https://doi.org/10.1109/ICDAR.2015.7333903
Torras, P., Baró, A., Kang, L., Fornés, A.: On the integration of language models into sequence to sequence architectures for handwritten music recognition. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, pp. 690–696 (2021).https://doi.org/10.5281/zenodo.5624451
Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Google Scholar
Villarreal, M., Sánchez, J.A.: Handwritten music recognition improvement through language model re-interpretation for mensural notation. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 199–204 (2020).https://doi.org/10.1109/ICFHR2020.2020.00045
Williams, R.J., Zipser, D.: Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity, pp. 433–486 (1995)
Google Scholar

Download references

Acknowledgements

The research presented in this work was supported by Grant PID2020-116813RB-I00a funded by MCIN/AEI/10.13039/501100011033 SimancasSearch project and by Grant CIACIF/2021/287 funded by Generalitat Valenciana.

Author information

Authors and Affiliations

Pattern Recognition and Human Language Technology Research Center, Universitat Politécnica de València, València, Spain
Manuel Villarreal & Joan Andreu Sánchez

Authors

Manuel Villarreal
View author publications
You can also search for this author in PubMed Google Scholar
Joan Andreu Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Villarreal .

Editor information

Editors and Affiliations

Luleå Tekniska Universitet, Luleå, Sweden
Elisa H. Barney Smith
Luleå Tekniska Universitet, Luleå, Sweden
Marcus Liwicki
Tsinghua University, Beijing, China
Liangrui Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Villarreal, M., Sánchez, J.A. (2024). Enhancing Recognition of Historical Musical Pieces with Synthetic and Composed Images. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14806. Springer, Cham. https://doi.org/10.1007/978-3-031-70543-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-70543-4_5
Published: 09 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70542-7
Online ISBN: 978-3-031-70543-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Enhancing Recognition of Historical Musical Pieces with Synthetic and Composed Images