Skip to main content

Combining Image Processing Techniques, OCR, and OMR for the Digitization of Musical Books

  • Conference paper
  • First Online:
Document Analysis Systems (DAS 2022)

Abstract

Digitizing historical music books can be challenging since staves are usually mixed with typewritten text explaining some characteristics of them. In this work, we propose a new methodology to undertake such a digitization task. After scanning the pages of the book, the different blocks of text and staves can be detected and organized into music pieces using image processing techniques. Then, OCR and OMR methods can be applied to text and stave blocks, respectively, and the information conveniently stored using the MusicXML format. In addition, we explain how this methodology was successfully applied in the digitization of a book entitled “The Music in the Santo Domingo’s Cathedral”. In particular, we provide a new annotated database of musical symbols from the staves included in this book. This database was used to develop two new OMR deep learning models for the detection and classification of music scores. The detection model obtained a F1-score of \(90\%\) on symbol detection; and the classification model a note pitch accuracy of \(98.4\%\). The method allows us to conduct text searches, obtain clean PDF files of music pieces, or reproduce the sound represented by the pieces. The database, models, and code of this project are available at https://github.com/joheras/MusicaCatedralStoDomingoIER.

This work was partially supported by Grant RTC-2017-6640-7; and by MCIN/AEI/10.13039/501100011033, under Grant PID2020-115225RB-I00.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 147–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_13

    Chapter  Google Scholar 

  2. Alfaro-Contreras, M., Valero-Mas, J.J.: Exploiting the two-dimensional nature of agnostic music notation for neural optical music recognition. Appl. Sci. 11(8), 3621 (2021)

    Article  Google Scholar 

  3. Bitteur, H.: Audiveris (2004). https://github.com/audiveris

  4. Bochkovskiy, A.: YOLO v4, v3 and v2 for Windows and Linux (2020). https://github.com/AlexeyAB/darknet

  5. Bochkovskiy, A., Wang, C., Liao, H.M.: YOLO v4: optimal speed and accuracy of object detection (2020). https://arxiv.org/abs/2004.10934

  6. Bradski, A.: Learning OpenCV, Computer Vision with OpenCV Library. O’Reilly Media, Sebastopol (2008)

    Google Scholar 

  7. Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44(3), 169–195 (2015)

    Article  Google Scholar 

  8. Calvo-Zaragoza, J., Hajič, J., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4), 1–35 (2020). https://doi.org/10.1145/3397499

    Article  Google Scholar 

  9. Calvo-Zaragoza, J., Rizo, D.: Camera-PrIMuS: neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th ISMIR Conference, pp. 248–255 (2018)

    Google Scholar 

  10. Calvo-Zaragoza, J., Rizo, D.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8(4) (2018). https://doi.org/10.3390/app8040606

  11. Chandra, S., Sisodia, S., Gupta, P.: Optical character recognition-a review. Int. Res. J. Eng. Technol. 7(04), 3037–3041 (2020)

    Google Scholar 

  12. Gallego, A.J., Calvo-Zaragoza, J.: Staff-line removal with selectional auto-encoders. Expert Syst. Appl. 89, 138–148 (2017)

    Article  Google Scholar 

  13. Good, M.: MusicXML: an internet-friendly format for sheet music. In: XML Conference and Expo, pp. 3–4 (2001). https://michaelgood.info/publications/music/musicxml-an-internet-friendly-format-for-sheet-music/

  14. Hajic, J., Pecina, P.: In search of a dataset for handwritten optical music recognition: Introducing MUSCIMA++ (2017). http://arxiv.org/abs/1703.04824

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://arxiv.org/abs/1512.03385

  16. Howard, J., Gugger, S.: FastAI: a layered API for deep learning. Information 11(2), 108 (2020)

    Article  Google Scholar 

  17. Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4547–4557 (2021)

    Google Scholar 

  18. Huang, Z., Jia, X., Guo, Y.: State-of-the-art model for music object recognition with deep learning. Appl. Sci. 9(13), 2645–2665 (2019). https://doi.org/10.3390/app9132645

    Article  Google Scholar 

  19. Huber, D.M.: The MIDI Manual: A Practical Guide to MIDI within Modern Music Production. A Focal Press Book, Waltham (2020)

    Book  Google Scholar 

  20. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  21. López-Caro, J.: La Música en la Catedral de Santo Domingo de la Calzada. Vol. I: Catálogo del Archivo de Música (1988)

    Google Scholar 

  22. Lyu, L., Koutraki, M., Krickl, M., Fetahu, B.: Neural OCR post-hoc correction of historical corpora. Trans. Assoc. Comput. Linguist. 9, 479–493 (2021)

    Article  Google Scholar 

  23. Mursari, L.R., Wibowo, A.: The effectiveness of image preprocessing on digital handwritten scripts recognition with the implementation of OCR Tesseract. Comput. Eng. Appl. J. 10(3), 177–186 (2021)

    Google Scholar 

  24. Musitek: SmartScore 64 (2021). https://www.musitek.com/

  25. Neuratron: PhotoScore 2020 (2020). https://www.neuratron.com/photoscore.htm

  26. Pezoa, F., Reutter, J.L., Suarez, F., Ugarte, M., Vrgoč, D.: Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web, pp. 263–273 (2016)

    Google Scholar 

  27. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks (2015). http://arxiv.org/abs/1506.01497

  28. Rosebrock, A., Thanki, A., Paul, S., Haase, J.: OCR with OpenCV, Tesseract and Python. PyImageSearch (2020)

    Google Scholar 

  29. Serra, J., Soille, P.: Mathematical Morphology and Its Applications to Image Processing. Springer Science & Business Media, Dordrecht (2012). https://doi.org/10.1007/978-94-011-1040-2

  30. Shatri, E., Fazekas, G.: Optical music recognition: state of the art and major challenges (2020). https://arxiv.org/abs/2006.07885

  31. Shatri, E., Fazekas, G.: DoReMi: first glance at a universal OMR dataset (2021). https://arxiv.org/abs/2107.07786

  32. Singh, A., Bacchuwar, K., Bhasin, A.: A survey of OCR applications. Int. J. Mach. Learn. Comput. 2(3), 314 (2012)

    Article  Google Scholar 

  33. Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633. IEEE (2007)

    Google Scholar 

  34. Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection (2019). http://arxiv.org/abs/1911.09070

  35. Tuggener, L., Satyawan, Y.P., Pacha, A., Schmidhuber, J., Stadelmann, T.: The DeepScoresV2 dataset and benchmark for music object detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9188–9195. IEEE (2021)

    Google Scholar 

  36. Vazquez, L.: IceVision: an agnostic object detection framework (2020). https://github.com/airctic/icevision

  37. Yousefi, J.: Image binarization using Otsu thresholding algorithm (2015). https://doi.org/10.13140/RG.2.1.4758.9284

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to César Domínguez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Santamaría, G., Domínguez, C., Heras, J., Mata, E., Pascual, V. (2022). Combining Image Processing Techniques, OCR, and OMR for the Digitization of Musical Books. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06555-2_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06554-5

  • Online ISBN: 978-3-031-06555-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics