Abstract
Handwritten Music Recognition (HMR) poses the problem of transcribing historical musical pieces from digital image to text. The vast number of untranscribed pieces, together with the scarcity of manually annotated training data renders the manual transcription impractical. Historical musical pieces of particular interest are those dating back to the XVth century and earlier, available only in their original manuscripts. Current state-of-the-art approaches leverage Convolutional and Recurrent Neural Networks (CRNN) due to their effectiveness in processing information without relying on extensive datasets. This paper addresses the data scarcity challenge in HMR by proposing two approaches. Firstly, the utilization of synthetic images to augment the training data, leveraging its successful applications in Handwritten Text Recognition (HTR). Secondly, the paper advocates for image composition, combining the images from a manuscript page to mitigate the contextual limitations associated with single-line processing. Despite challenges observed in traditional HTR models, the regularity found in historical musical document layouts enhances the suitability of image composition for HMR. These approaches allow us to develop a system that can take advantage of additional samples and contextual information to improve the recognition capabilities of the HMR models. Results obtained show a relative improvement when working with synthetic images and a substantial improvement when image composition is considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A.: From optical music recognition to handwritten music recognition: A baseline. Pattern Recogn. Lett. 123, 1–8 (2019). https://doi.org/10.1016/j.patrec.2019.02.029
Burgoyne, J., Pugin, L., Eustace, G., Fujinaga, I.: A comparative survey of image binarisation algorithms for optical recognition on degraded musical sources. pp. 509–512 (01 2007)
Calvo-Zaragoza, J., Barbancho, I., Tardon, L., Barbancho, A.: Avoiding staff removal stage in optical music recognition: application to scores written in white mensural notation. Formal Pattern Anal. Appl. 18 (09 2014).https://doi.org/10.1007/s10044-014-0415-5
Calvo-Zaragoza, J., Pacha, A., Shatri, E.: Proceedings of the 5th international workshop on reading music systems (2023)
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Early handwritten music recognition with hidden markov models. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 319–324 (2016).https://doi.org/10.1109/ICFHR.2016.0067
Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn. Lett. 128, 115–121 (2019). https://doi.org/10.1016/j.patrec.2019.08.021
Cardoso, J., Capela, A., Rebelo, A., Guedes, C., Costa, J.: Staff detection with stable paths. IEEE transactions on pattern analysis and machine intelligence 31, 1134–9 (07 2009).https://doi.org/10.1109/TPAMI.2009.34
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–394 (1999). https://doi.org/10.1006/csla.1999.0128, https://www.sciencedirect.com/science/article/pii/S0885230899901286
Coquenet, D., Chatelain, C., Paquet, T.: Dan: a segmentation-free document attention network for handwritten document recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(7), 8227–8243 (2023). https://doi.org/10.1109/TPAMI.2023.3235826
Fornés, A., Lladós, J., Sánchez, G.: Old handwritten musical symbol classification by a dynamic time warping based method, vol. 5046, pp. 51–60, September 2007.https://doi.org/10.1007/978-3-540-88188-9_6
Fujinaga, I.: Salzinnes, CDN-Hsmu M2149.L4, Halifax (Canada), St. Mary’s University - Patrick Power Library, M2149.L4 1554
Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: NIPS’16, pp. 1027–1035 (2016)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pp. 369–376 (2006). https://doi.org/10.1145/1143844.1143891
Hecht-Nielsen: Theory of the backpropagation neural network. In: International 1989 Joint Conference on Neural Networks, pp. 593–605, vol. 1 (1989).https://doi.org/10.1109/IJCNN.1989.118638
Henderson, R.: Solmization Syllables in Musical Theory, 1100 to 1600. Columbia University (1969). https://books.google.es/books?id=fiufSgAACAAJ
Kang, L., Riba, P., Wang, Y., Rusiñol, M., Fornés, A., Villegas, M.: Ganwriting: Content-conditioned generation of styled handwritten word images (2020)
Kim, G., et al.: Donut: document understanding transformer without ocr. arXiv preprint arXiv:2111.15664, 7, 15 (2021)
Martinez Sevilla, J., Ríos Vila, A., Castellanos, F., Calvo-Zaragoza, J.: A Holistic Approach for Aligned Music and Lyrics Transcription, pp. 185–201, August 2023. https://doi.org/10.1007/978-3-031-41676-7_11
Nikolaidou, K., et al.: Wordstylist: Styled verbatim handwritten text generation with latent diffusion models (2023)
Pondenkandath, V., Alberti, M., Diatta, M., Ingold, R., Liwicki, M.: Historical document synthesis with generative adversarial networks. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 146–151 (2019).https://doi.org/10.1109/ICDARW.2019.40096
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)
Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marçal, A., Guedes, C., Cardoso, J.: Optical music recognition: state-of-the-art and open issues. Int. J. Multimed. Inf. Retrieval 1, October 2012. https://doi.org/10.1007/s13735-012-0004-6
Ríos-Vila, A., Iñesta, J.M., Calvo-Zaragoza, J.: End-to-end full-page optical music recognition for mensural notation. In: Ismir 2022 Hybrid Conference (2022)
Ríos-Vila, A., Iñesta, J.M., Calvo-Zaragoza, J.: On the use of transformers for end-to-end optical music recognition. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds.) Pattern Recognition and Image Analysis, pp. 470–481. Springer, Cham (2022)
Rossant, F., Isabelle, B.: Robust and adaptive omr system including fuzzy modeling, fusion of musical rules, and possible error detection. EURASIP J. Adv. Signal Process. 2007 (01 2007). https://doi.org/10.1155/2007/81541
Tanha, J., Does, J., Depuydt, K., Sánchez, J.: Crossing the lines: making optimal use of context in line-based handwritten text recognition. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 956–960 (2015). https://doi.org/10.1109/ICDAR.2015.7333903
Torras, P., Baró, A., Kang, L., Fornés, A.: On the integration of language models into sequence to sequence architectures for handwritten music recognition. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, pp. 690–696 (2021).https://doi.org/10.5281/zenodo.5624451
Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Villarreal, M., Sánchez, J.A.: Handwritten music recognition improvement through language model re-interpretation for mensural notation. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 199–204 (2020).https://doi.org/10.1109/ICFHR2020.2020.00045
Williams, R.J., Zipser, D.: Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity, pp. 433–486 (1995)
Acknowledgements
The research presented in this work was supported by Grant PID2020-116813RB-I00a funded by MCIN/AEI/10.13039/501100011033 SimancasSearch project and by Grant CIACIF/2021/287 funded by Generalitat Valenciana.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Villarreal, M., Sánchez, J.A. (2024). Enhancing Recognition of Historical Musical Pieces with Synthetic and Composed Images. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14806. Springer, Cham. https://doi.org/10.1007/978-3-031-70543-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-70543-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70542-7
Online ISBN: 978-3-031-70543-4
eBook Packages: Computer ScienceComputer Science (R0)