Skip to main content

Enhancing Recognition of Historical Musical Pieces with Synthetic and Composed Images

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2024 (ICDAR 2024)

Abstract

Handwritten Music Recognition (HMR) poses the problem of transcribing historical musical pieces from digital image to text. The vast number of untranscribed pieces, together with the scarcity of manually annotated training data renders the manual transcription impractical. Historical musical pieces of particular interest are those dating back to the XVth century and earlier, available only in their original manuscripts. Current state-of-the-art approaches leverage Convolutional and Recurrent Neural Networks (CRNN) due to their effectiveness in processing information without relying on extensive datasets. This paper addresses the data scarcity challenge in HMR by proposing two approaches. Firstly, the utilization of synthetic images to augment the training data, leveraging its successful applications in Handwritten Text Recognition (HTR). Secondly, the paper advocates for image composition, combining the images from a manuscript page to mitigate the contextual limitations associated with single-line processing. Despite challenges observed in traditional HTR models, the regularity found in historical musical document layouts enhances the suitability of image composition for HMR. These approaches allow us to develop a system that can take advantage of additional samples and contextual information to improve the recognition capabilities of the HMR models. Results obtained show a relative improvement when working with synthetic images and a substantial improvement when image composition is considered.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.speech.sri.com/projects/srilm/manpages/ngram-count.1.html.

  2. 2.

    https://www.thelatinlibrary.com/medieval.html.

  3. 3.

    https://github.com/gregorio-project/hyphen-la.

  4. 4.

    https://github.com/Belval/TextRecognitionDataGenerator.

  5. 5.

    https://github.com/jpuigcerver/PyLaia.

  6. 6.

    https://kaldi-asr.org/.

  7. 7.

    https://openfst.org/.

References

  1. Baró, A., Riba, P., Calvo-Zaragoza, J., Fornés, A.: From optical music recognition to handwritten music recognition: A baseline. Pattern Recogn. Lett. 123, 1–8 (2019). https://doi.org/10.1016/j.patrec.2019.02.029

    Article  MATH  Google Scholar 

  2. Burgoyne, J., Pugin, L., Eustace, G., Fujinaga, I.: A comparative survey of image binarisation algorithms for optical recognition on degraded musical sources. pp. 509–512 (01 2007)

    Google Scholar 

  3. Calvo-Zaragoza, J., Barbancho, I., Tardon, L., Barbancho, A.: Avoiding staff removal stage in optical music recognition: application to scores written in white mensural notation. Formal Pattern Anal. Appl. 18 (09 2014).https://doi.org/10.1007/s10044-014-0415-5

  4. Calvo-Zaragoza, J., Pacha, A., Shatri, E.: Proceedings of the 5th international workshop on reading music systems (2023)

    Google Scholar 

  5. Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Early handwritten music recognition with hidden markov models. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR). pp. 319–324 (2016).https://doi.org/10.1109/ICFHR.2016.0067

  6. Calvo-Zaragoza, J., Toselli, A.H., Vidal, E.: Handwritten music recognition for mensural notation with convolutional recurrent neural networks. Pattern Recogn. Lett. 128, 115–121 (2019). https://doi.org/10.1016/j.patrec.2019.08.021

    Article  MATH  Google Scholar 

  7. Cardoso, J., Capela, A., Rebelo, A., Guedes, C., Costa, J.: Staff detection with stable paths. IEEE transactions on pattern analysis and machine intelligence 31, 1134–9 (07 2009).https://doi.org/10.1109/TPAMI.2009.34

  8. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–394 (1999). https://doi.org/10.1006/csla.1999.0128, https://www.sciencedirect.com/science/article/pii/S0885230899901286

  9. Coquenet, D., Chatelain, C., Paquet, T.: Dan: a segmentation-free document attention network for handwritten document recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(7), 8227–8243 (2023). https://doi.org/10.1109/TPAMI.2023.3235826

    Article  MATH  Google Scholar 

  10. Fornés, A., Lladós, J., Sánchez, G.: Old handwritten musical symbol classification by a dynamic time warping based method, vol. 5046, pp. 51–60, September 2007.https://doi.org/10.1007/978-3-540-88188-9_6

  11. Fujinaga, I.: Salzinnes, CDN-Hsmu M2149.L4, Halifax (Canada), St. Mary’s University - Patrick Power Library, M2149.L4 1554

    Google Scholar 

  12. Gal, Y., Ghahramani, Z.: A theoretically grounded application of dropout in recurrent neural networks. In: NIPS’16, pp. 1027–1035 (2016)

    Google Scholar 

  13. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pp. 369–376 (2006). https://doi.org/10.1145/1143844.1143891

  14. Hecht-Nielsen: Theory of the backpropagation neural network. In: International 1989 Joint Conference on Neural Networks, pp. 593–605, vol. 1 (1989).https://doi.org/10.1109/IJCNN.1989.118638

  15. Henderson, R.: Solmization Syllables in Musical Theory, 1100 to 1600. Columbia University (1969). https://books.google.es/books?id=fiufSgAACAAJ

  16. Kang, L., Riba, P., Wang, Y., Rusiñol, M., Fornés, A., Villegas, M.: Ganwriting: Content-conditioned generation of styled handwritten word images (2020)

    Google Scholar 

  17. Kim, G., et al.: Donut: document understanding transformer without ocr. arXiv preprint arXiv:2111.15664, 7, 15 (2021)

  18. Martinez Sevilla, J., Ríos Vila, A., Castellanos, F., Calvo-Zaragoza, J.: A Holistic Approach for Aligned Music and Lyrics Transcription, pp. 185–201, August 2023. https://doi.org/10.1007/978-3-031-41676-7_11

  19. Nikolaidou, K., et al.: Wordstylist: Styled verbatim handwritten text generation with latent diffusion models (2023)

    Google Scholar 

  20. Pondenkandath, V., Alberti, M., Diatta, M., Ingold, R., Liwicki, M.: Historical document synthesis with generative adversarial networks. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 146–151 (2019).https://doi.org/10.1109/ICDARW.2019.40096

  21. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022)

    Google Scholar 

  22. Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marçal, A., Guedes, C., Cardoso, J.: Optical music recognition: state-of-the-art and open issues. Int. J. Multimed. Inf. Retrieval 1, October 2012. https://doi.org/10.1007/s13735-012-0004-6

  23. Ríos-Vila, A., Iñesta, J.M., Calvo-Zaragoza, J.: End-to-end full-page optical music recognition for mensural notation. In: Ismir 2022 Hybrid Conference (2022)

    Google Scholar 

  24. Ríos-Vila, A., Iñesta, J.M., Calvo-Zaragoza, J.: On the use of transformers for end-to-end optical music recognition. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds.) Pattern Recognition and Image Analysis, pp. 470–481. Springer, Cham (2022)

    Chapter  Google Scholar 

  25. Rossant, F., Isabelle, B.: Robust and adaptive omr system including fuzzy modeling, fusion of musical rules, and possible error detection. EURASIP J. Adv. Signal Process. 2007 (01 2007). https://doi.org/10.1155/2007/81541

  26. Tanha, J., Does, J., Depuydt, K., Sánchez, J.: Crossing the lines: making optimal use of context in line-based handwritten text recognition. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 956–960 (2015). https://doi.org/10.1109/ICDAR.2015.7333903

  27. Torras, P., Baró, A., Kang, L., Fornés, A.: On the integration of language models into sequence to sequence architectures for handwritten music recognition. In: Proceedings of the 22nd International Society for Music Information Retrieval Conference, pp. 690–696 (2021).https://doi.org/10.5281/zenodo.5624451

  28. Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems 30 (2017)

    Google Scholar 

  29. Villarreal, M., Sánchez, J.A.: Handwritten music recognition improvement through language model re-interpretation for mensural notation. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 199–204 (2020).https://doi.org/10.1109/ICFHR2020.2020.00045

  30. Williams, R.J., Zipser, D.: Gradient-Based Learning Algorithms for Recurrent Networks and Their Computational Complexity, pp. 433–486 (1995)

    Google Scholar 

Download references

Acknowledgements

The research presented in this work was supported by Grant PID2020-116813RB-I00a funded by MCIN/AEI/10.13039/501100011033 SimancasSearch project and by Grant CIACIF/2021/287 funded by Generalitat Valenciana.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel Villarreal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Villarreal, M., Sánchez, J.A. (2024). Enhancing Recognition of Historical Musical Pieces with Synthetic and Composed Images. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14806. Springer, Cham. https://doi.org/10.1007/978-3-031-70543-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70543-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70542-7

  • Online ISBN: 978-3-031-70543-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics