Abstract
Handwritten music recognition (HMR) is the technology of transcribing the content of images of music scores. The accurate detection of music objects at the page level is one of the main challenges of HMR. Thus far, the existing methods suffer from the tiny and dense nature of handwritten music notations and realize positive detection accuracy only on snippets. In this paper, we propose a detector that consists of a staff line removal model and a handwritten music object detection model for page-level handwritten music object recognition. First, an end-to-end staff line removal model R_Staff_Net based on residual learning reduces the complexity of page-level detection. Second, we developed an improved YOLO-V4 model for handwritten music object detection. The improvements mainly concern the adoption of a de-coupled detection head and visual attention module in the YOLO-V4, and an adaptive multi-scale feature fusion module AMFFM is used to enhance the textures and features of tiny music symbols in the deep convolution layers, and the gradient harmonized mechanism is utilized to address the inherent imbalance between music objects. We verified the R_Staff_Net and the improved YOLO-V4 model on the ICDAR/GREC staff line removal dataset and the MUSCIMA++ dataset, respectively. The experiments highlight that R_Staff_Net presents outstanding performance with an F-M score of 98.64%, and our improved YOLO-V4 model is superior to other handwritten music symbol detection methods with a mean average precision (mAP) of 91.8% when addressing page-level input. Although the experimental results of the detector show that the R_Staff_Net helps little to the overall mAP, the network is beneficial for symbols that are similar to staff lines or heavily overlap staff lines.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Baro A, Riba P, Fornés A (2016) Towards the recognition of compound music notes in handwritten music scores. Paper presented at the15th international conference on frontiers in handwriting recognition (ICFHR), IEEE, Shenzhen, 23–26 Oct 2016
Baró A, Riba P, Calvo-Zaragoza J et al (2019) From optical music recognition to handwritten music recognition: a baseline. Pattern Recognit Lett 123:1–8. https://doi.org/10.1016/j.patrec.2019.02.029
Bochkovskiy A, Wang C, Liao H (2020) Yolov4: optimal speed and accuracy of object detection. Preprint at arXiv:2004.10934
Calvo-Zaragoza J, Rizo D (2018) End-to-end neural optical music recognition of monophonic scores. Appl Sci 8:606. https://doi.org/10.3390/app8040606
Calvo-Zaragoza J, Micó L, Oncina J (2016) Music staff removal with supervised pixel classification. Int J Doc Anal Recognit 19:211–219. https://doi.org/10.1007/s10032-016-0266-2
Calvo-Zaragoza J, Pertusa A, Oncina J (2017) Staff-line detection and removal using a convolutional neural network. Mach Vis Appl 28:665–674. https://doi.org/10.1007/s00138-017-0844-4
Calvo-Zaragoza J, Valero-Mas J, Pertusa A (2017b) End-to-end optical music recognition using neural networks. Paper presented at the 18th international society for music information retrieval conference, ISMIR, Suzhou, 23–27 Oct 2017
Cao J, Li Y, Sun M, et al (2020) Do-conv: depthwise over-parameterized convolutional layer. Preprint at arXiv:2006.12030
Dai J, Li Y, He K, et al (2016) R-fcn: object detection via region-based fully convolutional networks. Paper presented at the 30th international conference on neural information processing systems, MIT Press , Barcelona, 5–10 Dec 2016
Escalera S, Fornés A, Pujol O et al (2009) Blurred shape model for binary and grey-level symbol recognition. Pattern Recognit Lett 30:1424–1433. https://doi.org/10.1016/j.patrec.2009.08.001
Fornés A, Lladós J, Sánchez G et al (2010) Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model. Int J Doc Anal Recognit (IJDAR) 13:229–241. https://doi.org/10.1007/s10032-010-0114-8
Fornés A, Dutta A, Gordo A, et al (2011) The icdar 2011 music scores competition: staff removal and writer identification. Paper presented at the 2011 international conference on document analysis and recognition, IEEE, Beijing, 18–21 Sep 2011
Fornés A, Dutta A, Gordo A et al (2012) Cvc-muscima: a ground truth of handwritten music score images for writer identification and staff removal. Int J Doc Anal Recognit 15:243–251. https://doi.org/10.1007/s10032-011-0168-2
Gai R, Chen N, Yuan H (2021) A detection algorithm for cherry fruits based on the improved yolo-v4 model. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06029-z
Gallego A, Calvo-Zaragoza J (2017) Staff-line removal with selectional auto-encoders. Expert Syst Appl 89:138–148. https://doi.org/10.1016/j.eswa.2017.07.002
Géraud T (2014) A morphological method for music score staff removal. Paper presented at the 2014 IEEE international conference on image processing, IEEE, Paris, 27–30 Oct 2014
Hajič J, Pavel P (2017) Detecting noteheads in handwritten scores with convnets and bounding box regression. Preprint at arXiv:1708.01806
Hajič J, Pecina P (2017) The muscima++ dataset for handwritten optical music recognition. Paper presented at the 14th IAPR international conference on document analysis and recognition, IEEE, kyoto, 9–12 Nov 2017
Hajič J, Dorfer M, Widmer G, et al (2018) Towards full-pipeline handwritten omr with musical symbol detection by u-nets. Paper presented at the 19th international society for music information retrieval conference, ISMIR, Paris, 23–27 Sep 2018
Hu J, Shen L, Sun G, (2018) Squeeze-and-excitation networks. Paper presented at the, (2017) IEEE conference on computer vision and pattern recognition. IEEE, Salt Lake City, 18–22 Jun 2018
K. A, K. A, B. A, et al (2018) Staff line removal using generative adversarial networks. Paper presented at the 24th international conference on pattern recognition, IEEE, Beijing, 20–24 Aug 2018
Lai W, Huang J, Ahuja N, et al (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. Paper presented at the 2017 IEEE conference on computer vision and pattern recognition, IEEE, Honolulu, 21–26 Jul 2017
Li B, Liu Y, Wang X (2019) Gradient harmonized single-stage detector. Paper presented at the 33rd AAAI conference on artificial intelligence, AAAI, Hawaii, 27 Jan - 1 Feb 2019
Lin T, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. Paper presented at the 2017 IEEE international conference on computer vision, IEEE, Venice, 22–29 Oct 2017
Liu W, Anguelov D, Erhan D, et al (2016) Ssd: single shot multibox detector. Paper presented at the 14th European conference on computer vision, Spriner, Amsterdam, 8–16 Oct 2016
M. D, Hajič J, Widmer G (2017) On the potential of fully convolutional neural networks for musical symbol detection. Paper presented at the 14th international conference on document analysis and recognition, IEEE, Kyoto, 9–15 Nov 2017
Montagner I, Hirata N, J H (2017) Staff removal using image operator learning. Pattern Recognit 63:310–320. https://doi.org/10.1016/j.patcog.2016.10.002
Pacha A, Eidenberger H (2017) Towards self-learning optical music recognition. Paper presented at the 16th international conference on machine learning and applications (ICMLA), IEEE, Cancun,18–21 Dec 2017
Pacha A, Choi K, Coüasnon B, et al (2018) Handwritten music object detection: open issues and baseline results. Paper presented at the 13th IAPR international workshop on document analysis systems, IEEE, Vienna, 24–27 Apr 2018
Rebelo A, Fujinaga I, Paszkiewicz F et al (2012) Optical music recognition: state-of-the-art and open issues. Int J Multimed Inf Retr 1:173–190. https://doi.org/10.1007/s13735-012-0004-6
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. Preprint at arXiv:1804.02767
Ren S, He K, Girshick R, et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems 28:91–99. Paper presented at the Advances in neural information processing systems, MIT Press, Montreal, 7–12 Dec 2015
Rossant F, Bloch I (2007) Robust and adaptive omr system including fuzzy modeling, fusion of musical rules, and possible error detection. EURASIP J Adv Signal Process 2007:1–25. https://doi.org/10.1155/2007/81541
Santos CD, Capela A, Rebelo A et al (2009) Staff detection with stable paths. IEEE Trans Pattern Anal Mach Intell 31:1134–1139. https://doi.org/10.1109/TPAMI.2009.34
Song G, Liu Y, Wang X (2020) Revisiting the sibling head in object detector. Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, 16–18 Jun 2020
Su B, Lu S, Pal U, et al (2012) An effective staff detection and removal technique for musical documents. Paper presented at the 10th IAPR international workshop on document analysis systems, IEEE, Queensland, 27–29 Mar 2012
Tuggener L, Elezi I, Schmidhuber J, et al (2018) Deep watershed detector for music object recognition. Paper presented at the 19th international society for music information retrieval conference, ISMIR, Paris, 23–27 Sep 2018
Visaniy M, Kieu AV, Fornés, Journet N (2013) Icdar 2013 music scores competition: staff removal. Paper presented at the 12th international conference on document analysis and recognition, IEEE, Washington DC, 25–28 Aug 2013
Wang J, Wang N, Li L (2020) Real-time behavior detection and judgment of egg breeders based on yolo v3. Neural Comput Appl 32:5471–5481. https://doi.org/10.1007/s00521-019-04645-4
Woo S, Park J, Lee JY, et al (2018) Cbam: convolutional block attention module. Paper presented at the 15th European conference on computer vision, Springer, Munich, 8–14 Sep 2018
Wu Y, Chen Y, Yuan L, et al (2020) Rethinking classification and localization for object detection. Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, 16–18 Jun 2020
Zhang H, Cisse M, Dauphin YN, et al (2017) mixup: beyond empirical risk minimization. Preprint at arXiv:1710.09412
Zheng Z, Wang P, Liu W, et al (2020) Distance-iou loss: faster and better learning for bounding box regression. Paper presented at the 34rd AAAI conference on artificial intelligence, AAAI, New York, 7–12 Feb 2020
Funding
This study was funded by National Natural Science Foundation of China (Young Scientists Fund) 618002044.
Author information
Authors and Affiliations
Contributions
YZ and ZH conceived of and designed the experiments. YZ performed the experiments and analyzed the data. ZH, YZ, YZ, and KR wrote this paper together.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Y., Huang, Z., Zhang, Y. et al. A detector for page-level handwritten music object recognition based on deep learning. Neural Comput & Applic 35, 9773–9787 (2023). https://doi.org/10.1007/s00521-023-08216-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08216-6