Combining Image Processing Techniques, OCR, and OMR for the Digitization of Musical Books

Santamaría, Gonzalo; Domínguez, César; Heras, Jónathan; Mata, Eloy; Pascual, Vico

doi:10.1007/978-3-031-06555-2_37

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13237))

Included in the following conference series:

International Workshop on Document Analysis Systems

Abstract

Digitizing historical music books can be challenging since staves are usually mixed with typewritten text explaining some characteristics of them. In this work, we propose a new methodology to undertake such a digitization task. After scanning the pages of the book, the different blocks of text and staves can be detected and organized into music pieces using image processing techniques. Then, OCR and OMR methods can be applied to text and stave blocks, respectively, and the information conveniently stored using the MusicXML format. In addition, we explain how this methodology was successfully applied in the digitization of a book entitled “The Music in the Santo Domingo’s Cathedral”. In particular, we provide a new annotated database of musical symbols from the staves included in this book. This database was used to develop two new OMR deep learning models for the detection and classification of music scores. The detection model obtained a F1-score of $90\%$ on symbol detection; and the classification model a note pitch accuracy of $98.4\%$. The method allows us to conduct text searches, obtain clean PDF files of music pieces, or reproduce the sound represented by the pieces. The database, models, and code of this project are available at https://github.com/joheras/MusicaCatedralStoDomingoIER.

This work was partially supported by Grant RTC-2017-6640-7; and by MCIN/AEI/10.13039/501100011033, under Grant PID2020-115225RB-I00.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Glyph and Position Classification of Music Symbols in Early Music Manuscripts

Full-Page Music Symbols Recognition: State-of-the-Art Deep Model Comparison for Handwritten and Printed Music Scores

The KuiSCIMA Dataset for Optical Music Recognition of Ancient Chinese Suzipu Notation

References

Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 147–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_13
Chapter Google Scholar
Alfaro-Contreras, M., Valero-Mas, J.J.: Exploiting the two-dimensional nature of agnostic music notation for neural optical music recognition. Appl. Sci. 11(8), 3621 (2021)
Article Google Scholar
Bitteur, H.: Audiveris (2004). https://github.com/audiveris
Bochkovskiy, A.: YOLO v4, v3 and v2 for Windows and Linux (2020). https://github.com/AlexeyAB/darknet
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLO v4: optimal speed and accuracy of object detection (2020). https://arxiv.org/abs/2004.10934
Bradski, A.: Learning OpenCV, Computer Vision with OpenCV Library. O’Reilly Media, Sebastopol (2008)
Google Scholar
Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44(3), 169–195 (2015)
Article Google Scholar
Calvo-Zaragoza, J., Hajič, J., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4), 1–35 (2020). https://doi.org/10.1145/3397499
Article Google Scholar
Calvo-Zaragoza, J., Rizo, D.: Camera-PrIMuS: neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th ISMIR Conference, pp. 248–255 (2018)
Google Scholar
Calvo-Zaragoza, J., Rizo, D.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8(4) (2018). https://doi.org/10.3390/app8040606
Chandra, S., Sisodia, S., Gupta, P.: Optical character recognition-a review. Int. Res. J. Eng. Technol. 7(04), 3037–3041 (2020)
Google Scholar
Gallego, A.J., Calvo-Zaragoza, J.: Staff-line removal with selectional auto-encoders. Expert Syst. Appl. 89, 138–148 (2017)
Article Google Scholar
Good, M.: MusicXML: an internet-friendly format for sheet music. In: XML Conference and Expo, pp. 3–4 (2001). https://michaelgood.info/publications/music/musicxml-an-internet-friendly-format-for-sheet-music/
Hajic, J., Pecina, P.: In search of a dataset for handwritten optical music recognition: Introducing MUSCIMA++ (2017). http://arxiv.org/abs/1703.04824
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://arxiv.org/abs/1512.03385
Howard, J., Gugger, S.: FastAI: a layered API for deep learning. Information 11(2), 108 (2020)
Article Google Scholar
Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4547–4557 (2021)
Google Scholar
Huang, Z., Jia, X., Guo, Y.: State-of-the-art model for music object recognition with deep learning. Appl. Sci. 9(13), 2645–2665 (2019). https://doi.org/10.3390/app9132645
Article Google Scholar
Huber, D.M.: The MIDI Manual: A Practical Guide to MIDI within Modern Music Production. A Focal Press Book, Waltham (2020)
Book Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
López-Caro, J.: La Música en la Catedral de Santo Domingo de la Calzada. Vol. I: Catálogo del Archivo de Música (1988)
Google Scholar
Lyu, L., Koutraki, M., Krickl, M., Fetahu, B.: Neural OCR post-hoc correction of historical corpora. Trans. Assoc. Comput. Linguist. 9, 479–493 (2021)
Article Google Scholar
Mursari, L.R., Wibowo, A.: The effectiveness of image preprocessing on digital handwritten scripts recognition with the implementation of OCR Tesseract. Comput. Eng. Appl. J. 10(3), 177–186 (2021)
Google Scholar
Musitek: SmartScore 64 (2021). https://www.musitek.com/
Neuratron: PhotoScore 2020 (2020). https://www.neuratron.com/photoscore.htm
Pezoa, F., Reutter, J.L., Suarez, F., Ugarte, M., Vrgoč, D.: Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web, pp. 263–273 (2016)
Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks (2015). http://arxiv.org/abs/1506.01497
Rosebrock, A., Thanki, A., Paul, S., Haase, J.: OCR with OpenCV, Tesseract and Python. PyImageSearch (2020)
Google Scholar
Serra, J., Soille, P.: Mathematical Morphology and Its Applications to Image Processing. Springer Science & Business Media, Dordrecht (2012). https://doi.org/10.1007/978-94-011-1040-2
Shatri, E., Fazekas, G.: Optical music recognition: state of the art and major challenges (2020). https://arxiv.org/abs/2006.07885
Shatri, E., Fazekas, G.: DoReMi: first glance at a universal OMR dataset (2021). https://arxiv.org/abs/2107.07786
Singh, A., Bacchuwar, K., Bhasin, A.: A survey of OCR applications. Int. J. Mach. Learn. Comput. 2(3), 314 (2012)
Article Google Scholar
Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633. IEEE (2007)
Google Scholar
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection (2019). http://arxiv.org/abs/1911.09070
Tuggener, L., Satyawan, Y.P., Pacha, A., Schmidhuber, J., Stadelmann, T.: The DeepScoresV2 dataset and benchmark for music object detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9188–9195. IEEE (2021)
Google Scholar
Vazquez, L.: IceVision: an agnostic object detection framework (2020). https://github.com/airctic/icevision
Yousefi, J.: Image binarization using Otsu thresholding algorithm (2015). https://doi.org/10.13140/RG.2.1.4758.9284

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of La Rioja, Logroño, Spain
Gonzalo Santamaría, César Domínguez, Jónathan Heras, Eloy Mata & Vico Pascual

Authors

Gonzalo Santamaría
View author publications
You can also search for this author in PubMed Google Scholar
César Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
Jónathan Heras
View author publications
You can also search for this author in PubMed Google Scholar
Eloy Mata
View author publications
You can also search for this author in PubMed Google Scholar
Vico Pascual
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to César Domínguez .

Editor information

Editors and Affiliations

Kyushu University, Fukuoka, Japan
Seiichi Uchida
Boise State University, BOISE, ID, USA
Elisa Barney
LIRIS UMR CNRS, Villeurbanne, France
Véronique Eglin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santamaría, G., Domínguez, C., Heras, J., Mata, E., Pascual, V. (2022). Combining Image Processing Techniques, OCR, and OMR for the Digitization of Musical Books. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_37

Download citation

DOI: https://doi.org/10.1007/978-3-031-06555-2_37
Published: 18 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06554-5
Online ISBN: 978-3-031-06555-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Combining Image Processing Techniques, OCR, and OMR for the Digitization of Musical Books