Abstract
Digitizing historical music books can be challenging since staves are usually mixed with typewritten text explaining some characteristics of them. In this work, we propose a new methodology to undertake such a digitization task. After scanning the pages of the book, the different blocks of text and staves can be detected and organized into music pieces using image processing techniques. Then, OCR and OMR methods can be applied to text and stave blocks, respectively, and the information conveniently stored using the MusicXML format. In addition, we explain how this methodology was successfully applied in the digitization of a book entitled “The Music in the Santo Domingo’s Cathedral”. In particular, we provide a new annotated database of musical symbols from the staves included in this book. This database was used to develop two new OMR deep learning models for the detection and classification of music scores. The detection model obtained a F1-score of \(90\%\) on symbol detection; and the classification model a note pitch accuracy of \(98.4\%\). The method allows us to conduct text searches, obtain clean PDF files of music pieces, or reproduce the sound represented by the pieces. The database, models, and code of this project are available at https://github.com/joheras/MusicaCatedralStoDomingoIER.
This work was partially supported by Grant RTC-2017-6640-7; and by MCIN/AEI/10.13039/501100011033, under Grant PID2020-115225RB-I00.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alfaro-Contreras, M., Calvo-Zaragoza, J., Iñesta, J.M.: Approaching end-to-end optical music recognition for homophonic scores. In: Morales, A., Fierrez, J., Sánchez, J.S., Ribeiro, B. (eds.) IbPRIA 2019. LNCS, vol. 11868, pp. 147–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-31321-0_13
Alfaro-Contreras, M., Valero-Mas, J.J.: Exploiting the two-dimensional nature of agnostic music notation for neural optical music recognition. Appl. Sci. 11(8), 3621 (2021)
Bitteur, H.: Audiveris (2004). https://github.com/audiveris
Bochkovskiy, A.: YOLO v4, v3 and v2 for Windows and Linux (2020). https://github.com/AlexeyAB/darknet
Bochkovskiy, A., Wang, C., Liao, H.M.: YOLO v4: optimal speed and accuracy of object detection (2020). https://arxiv.org/abs/2004.10934
Bradski, A.: Learning OpenCV, Computer Vision with OpenCV Library. O’Reilly Media, Sebastopol (2008)
Byrd, D., Simonsen, J.G.: Towards a standard testbed for optical music recognition: definitions, metrics, and page images. J. New Music Res. 44(3), 169–195 (2015)
Calvo-Zaragoza, J., Hajič, J., Pacha, A.: Understanding optical music recognition. ACM Comput. Surv. 53(4), 1–35 (2020). https://doi.org/10.1145/3397499
Calvo-Zaragoza, J., Rizo, D.: Camera-PrIMuS: neural end-to-end optical music recognition on realistic monophonic scores. In: Proceedings of the 19th ISMIR Conference, pp. 248–255 (2018)
Calvo-Zaragoza, J., Rizo, D.: End-to-end neural optical music recognition of monophonic scores. Appl. Sci. 8(4) (2018). https://doi.org/10.3390/app8040606
Chandra, S., Sisodia, S., Gupta, P.: Optical character recognition-a review. Int. Res. J. Eng. Technol. 7(04), 3037–3041 (2020)
Gallego, A.J., Calvo-Zaragoza, J.: Staff-line removal with selectional auto-encoders. Expert Syst. Appl. 89, 138–148 (2017)
Good, M.: MusicXML: an internet-friendly format for sheet music. In: XML Conference and Expo, pp. 3–4 (2001). https://michaelgood.info/publications/music/musicxml-an-internet-friendly-format-for-sheet-music/
Hajic, J., Pecina, P.: In search of a dataset for handwritten optical music recognition: Introducing MUSCIMA++ (2017). http://arxiv.org/abs/1703.04824
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015). https://arxiv.org/abs/1512.03385
Howard, J., Gugger, S.: FastAI: a layered API for deep learning. Information 11(2), 108 (2020)
Huang, J., et al.: A multiplexed network for end-to-end, multilingual OCR. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4547–4557 (2021)
Huang, Z., Jia, X., Guo, Y.: State-of-the-art model for music object recognition with deep learning. Appl. Sci. 9(13), 2645–2665 (2019). https://doi.org/10.3390/app9132645
Huber, D.M.: The MIDI Manual: A Practical Guide to MIDI within Modern Music Production. A Focal Press Book, Waltham (2020)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
López-Caro, J.: La Música en la Catedral de Santo Domingo de la Calzada. Vol. I: Catálogo del Archivo de Música (1988)
Lyu, L., Koutraki, M., Krickl, M., Fetahu, B.: Neural OCR post-hoc correction of historical corpora. Trans. Assoc. Comput. Linguist. 9, 479–493 (2021)
Mursari, L.R., Wibowo, A.: The effectiveness of image preprocessing on digital handwritten scripts recognition with the implementation of OCR Tesseract. Comput. Eng. Appl. J. 10(3), 177–186 (2021)
Musitek: SmartScore 64 (2021). https://www.musitek.com/
Neuratron: PhotoScore 2020 (2020). https://www.neuratron.com/photoscore.htm
Pezoa, F., Reutter, J.L., Suarez, F., Ugarte, M., Vrgoč, D.: Foundations of JSON schema. In: Proceedings of the 25th International Conference on World Wide Web, pp. 263–273 (2016)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks (2015). http://arxiv.org/abs/1506.01497
Rosebrock, A., Thanki, A., Paul, S., Haase, J.: OCR with OpenCV, Tesseract and Python. PyImageSearch (2020)
Serra, J., Soille, P.: Mathematical Morphology and Its Applications to Image Processing. Springer Science & Business Media, Dordrecht (2012). https://doi.org/10.1007/978-94-011-1040-2
Shatri, E., Fazekas, G.: Optical music recognition: state of the art and major challenges (2020). https://arxiv.org/abs/2006.07885
Shatri, E., Fazekas, G.: DoReMi: first glance at a universal OMR dataset (2021). https://arxiv.org/abs/2107.07786
Singh, A., Bacchuwar, K., Bhasin, A.: A survey of OCR applications. Int. J. Mach. Learn. Comput. 2(3), 314 (2012)
Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition, ICDAR 2007, vol. 2, pp. 629–633. IEEE (2007)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection (2019). http://arxiv.org/abs/1911.09070
Tuggener, L., Satyawan, Y.P., Pacha, A., Schmidhuber, J., Stadelmann, T.: The DeepScoresV2 dataset and benchmark for music object detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9188–9195. IEEE (2021)
Vazquez, L.: IceVision: an agnostic object detection framework (2020). https://github.com/airctic/icevision
Yousefi, J.: Image binarization using Otsu thresholding algorithm (2015). https://doi.org/10.13140/RG.2.1.4758.9284
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Santamaría, G., Domínguez, C., Heras, J., Mata, E., Pascual, V. (2022). Combining Image Processing Techniques, OCR, and OMR for the Digitization of Musical Books. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-06555-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06554-5
Online ISBN: 978-3-031-06555-2
eBook Packages: Computer ScienceComputer Science (R0)