Skip to main content
Log in

A detector for page-level handwritten music object recognition based on deep learning

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Handwritten music recognition (HMR) is the technology of transcribing the content of images of music scores. The accurate detection of music objects at the page level is one of the main challenges of HMR. Thus far, the existing methods suffer from the tiny and dense nature of handwritten music notations and realize positive detection accuracy only on snippets. In this paper, we propose a detector that consists of a staff line removal model and a handwritten music object detection model for page-level handwritten music object recognition. First, an end-to-end staff line removal model R_Staff_Net based on residual learning reduces the complexity of page-level detection. Second, we developed an improved YOLO-V4 model for handwritten music object detection. The improvements mainly concern the adoption of a de-coupled detection head and visual attention module in the YOLO-V4, and an adaptive multi-scale feature fusion module AMFFM is used to enhance the textures and features of tiny music symbols in the deep convolution layers, and the gradient harmonized mechanism is utilized to address the inherent imbalance between music objects. We verified the R_Staff_Net and the improved YOLO-V4 model on the ICDAR/GREC staff line removal dataset and the MUSCIMA++ dataset, respectively. The experiments highlight that R_Staff_Net presents outstanding performance with an F-M score of 98.64%, and our improved YOLO-V4 model is superior to other handwritten music symbol detection methods with a mean average precision (mAP) of 91.8% when addressing page-level input. Although the experimental results of the detector show that the R_Staff_Net helps little to the overall mAP, the network is beneficial for symbols that are similar to staff lines or heavily overlap staff lines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Baro A, Riba P, Fornés A (2016) Towards the recognition of compound music notes in handwritten music scores. Paper presented at the15th international conference on frontiers in handwriting recognition (ICFHR), IEEE, Shenzhen, 23–26 Oct 2016

  2. Baró A, Riba P, Calvo-Zaragoza J et al (2019) From optical music recognition to handwritten music recognition: a baseline. Pattern Recognit Lett 123:1–8. https://doi.org/10.1016/j.patrec.2019.02.029

    Article  Google Scholar 

  3. Bochkovskiy A, Wang C, Liao H (2020) Yolov4: optimal speed and accuracy of object detection. Preprint at arXiv:2004.10934

  4. Calvo-Zaragoza J, Rizo D (2018) End-to-end neural optical music recognition of monophonic scores. Appl Sci 8:606. https://doi.org/10.3390/app8040606

    Article  Google Scholar 

  5. Calvo-Zaragoza J, Micó L, Oncina J (2016) Music staff removal with supervised pixel classification. Int J Doc Anal Recognit 19:211–219. https://doi.org/10.1007/s10032-016-0266-2

    Article  Google Scholar 

  6. Calvo-Zaragoza J, Pertusa A, Oncina J (2017) Staff-line detection and removal using a convolutional neural network. Mach Vis Appl 28:665–674. https://doi.org/10.1007/s00138-017-0844-4

    Article  Google Scholar 

  7. Calvo-Zaragoza J, Valero-Mas J, Pertusa A (2017b) End-to-end optical music recognition using neural networks. Paper presented at the 18th international society for music information retrieval conference, ISMIR, Suzhou, 23–27 Oct 2017

  8. Cao J, Li Y, Sun M, et al (2020) Do-conv: depthwise over-parameterized convolutional layer. Preprint at arXiv:2006.12030

  9. Dai J, Li Y, He K, et al (2016) R-fcn: object detection via region-based fully convolutional networks. Paper presented at the 30th international conference on neural information processing systems, MIT Press , Barcelona, 5–10 Dec 2016

  10. Escalera S, Fornés A, Pujol O et al (2009) Blurred shape model for binary and grey-level symbol recognition. Pattern Recognit Lett 30:1424–1433. https://doi.org/10.1016/j.patrec.2009.08.001

    Article  Google Scholar 

  11. Fornés A, Lladós J, Sánchez G et al (2010) Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model. Int J Doc Anal Recognit (IJDAR) 13:229–241. https://doi.org/10.1007/s10032-010-0114-8

    Article  Google Scholar 

  12. Fornés A, Dutta A, Gordo A, et al (2011) The icdar 2011 music scores competition: staff removal and writer identification. Paper presented at the 2011 international conference on document analysis and recognition, IEEE, Beijing, 18–21 Sep 2011

  13. Fornés A, Dutta A, Gordo A et al (2012) Cvc-muscima: a ground truth of handwritten music score images for writer identification and staff removal. Int J Doc Anal Recognit 15:243–251. https://doi.org/10.1007/s10032-011-0168-2

    Article  Google Scholar 

  14. Gai R, Chen N, Yuan H (2021) A detection algorithm for cherry fruits based on the improved yolo-v4 model. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06029-z

    Article  Google Scholar 

  15. Gallego A, Calvo-Zaragoza J (2017) Staff-line removal with selectional auto-encoders. Expert Syst Appl 89:138–148. https://doi.org/10.1016/j.eswa.2017.07.002

    Article  Google Scholar 

  16. Géraud T (2014) A morphological method for music score staff removal. Paper presented at the 2014 IEEE international conference on image processing, IEEE, Paris, 27–30 Oct 2014

  17. Hajič J, Pavel P (2017) Detecting noteheads in handwritten scores with convnets and bounding box regression. Preprint at arXiv:1708.01806

  18. Hajič J, Pecina P (2017) The muscima++ dataset for handwritten optical music recognition. Paper presented at the 14th IAPR international conference on document analysis and recognition, IEEE, kyoto, 9–12 Nov 2017

  19. Hajič J, Dorfer M, Widmer G, et al (2018) Towards full-pipeline handwritten omr with musical symbol detection by u-nets. Paper presented at the 19th international society for music information retrieval conference, ISMIR, Paris, 23–27 Sep 2018

  20. Hu J, Shen L, Sun G, (2018) Squeeze-and-excitation networks. Paper presented at the, (2017) IEEE conference on computer vision and pattern recognition. IEEE, Salt Lake City, 18–22 Jun 2018

  21. K. A, K. A, B. A, et al (2018) Staff line removal using generative adversarial networks. Paper presented at the 24th international conference on pattern recognition, IEEE, Beijing, 20–24 Aug 2018

  22. Lai W, Huang J, Ahuja N, et al (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. Paper presented at the 2017 IEEE conference on computer vision and pattern recognition, IEEE, Honolulu, 21–26 Jul 2017

  23. Li B, Liu Y, Wang X (2019) Gradient harmonized single-stage detector. Paper presented at the 33rd AAAI conference on artificial intelligence, AAAI, Hawaii, 27 Jan - 1 Feb 2019

  24. Lin T, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. Paper presented at the 2017 IEEE international conference on computer vision, IEEE, Venice, 22–29 Oct 2017

  25. Liu W, Anguelov D, Erhan D, et al (2016) Ssd: single shot multibox detector. Paper presented at the 14th European conference on computer vision, Spriner, Amsterdam, 8–16 Oct 2016

  26. M. D, Hajič J, Widmer G (2017) On the potential of fully convolutional neural networks for musical symbol detection. Paper presented at the 14th international conference on document analysis and recognition, IEEE, Kyoto, 9–15 Nov 2017

  27. Montagner I, Hirata N, J H (2017) Staff removal using image operator learning. Pattern Recognit 63:310–320. https://doi.org/10.1016/j.patcog.2016.10.002

    Article  Google Scholar 

  28. Pacha A, Eidenberger H (2017) Towards self-learning optical music recognition. Paper presented at the 16th international conference on machine learning and applications (ICMLA), IEEE, Cancun,18–21 Dec 2017

  29. Pacha A, Choi K, Coüasnon B, et al (2018) Handwritten music object detection: open issues and baseline results. Paper presented at the 13th IAPR international workshop on document analysis systems, IEEE, Vienna, 24–27 Apr 2018

  30. Rebelo A, Fujinaga I, Paszkiewicz F et al (2012) Optical music recognition: state-of-the-art and open issues. Int J Multimed Inf Retr 1:173–190. https://doi.org/10.1007/s13735-012-0004-6

    Article  Google Scholar 

  31. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. Preprint at arXiv:1804.02767

  32. Ren S, He K, Girshick R, et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems 28:91–99. Paper presented at the Advances in neural information processing systems, MIT Press, Montreal, 7–12 Dec 2015

  33. Rossant F, Bloch I (2007) Robust and adaptive omr system including fuzzy modeling, fusion of musical rules, and possible error detection. EURASIP J Adv Signal Process 2007:1–25. https://doi.org/10.1155/2007/81541

    Article  MATH  Google Scholar 

  34. Santos CD, Capela A, Rebelo A et al (2009) Staff detection with stable paths. IEEE Trans Pattern Anal Mach Intell 31:1134–1139. https://doi.org/10.1109/TPAMI.2009.34

    Article  Google Scholar 

  35. Song G, Liu Y, Wang X (2020) Revisiting the sibling head in object detector. Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, 16–18 Jun 2020

  36. Su B, Lu S, Pal U, et al (2012) An effective staff detection and removal technique for musical documents. Paper presented at the 10th IAPR international workshop on document analysis systems, IEEE, Queensland, 27–29 Mar 2012

  37. Tuggener L, Elezi I, Schmidhuber J, et al (2018) Deep watershed detector for music object recognition. Paper presented at the 19th international society for music information retrieval conference, ISMIR, Paris, 23–27 Sep 2018

  38. Visaniy M, Kieu AV,  Fornés, Journet N (2013) Icdar 2013 music scores competition: staff removal. Paper presented at the 12th international conference on document analysis and recognition, IEEE, Washington DC, 25–28 Aug 2013

  39. Wang J, Wang N, Li L (2020) Real-time behavior detection and judgment of egg breeders based on yolo v3. Neural Comput Appl 32:5471–5481. https://doi.org/10.1007/s00521-019-04645-4

    Article  Google Scholar 

  40. Woo S, Park J, Lee JY, et al (2018) Cbam: convolutional block attention module. Paper presented at the 15th European conference on computer vision, Springer, Munich, 8–14 Sep 2018

  41. Wu Y, Chen Y, Yuan L, et al (2020) Rethinking classification and localization for object detection. Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, 16–18 Jun 2020

  42. Zhang H, Cisse M, Dauphin YN, et al (2017) mixup: beyond empirical risk minimization. Preprint at arXiv:1710.09412

  43. Zheng Z, Wang P, Liu W, et al (2020) Distance-iou loss: faster and better learning for bounding box regression. Paper presented at the 34rd AAAI conference on artificial intelligence, AAAI, New York, 7–12 Feb 2020

Download references

Funding

This study was funded by National Natural Science Foundation of China (Young Scientists Fund) 618002044.

Author information

Authors and Affiliations

Authors

Contributions

YZ and ZH conceived of and designed the experiments. YZ performed the experiments and analyzed the data. ZH, YZ, YZ, and KR wrote this paper together.

Corresponding author

Correspondence to Zhiqing Huang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Huang, Z., Zhang, Y. et al. A detector for page-level handwritten music object recognition based on deep learning. Neural Comput & Applic 35, 9773–9787 (2023). https://doi.org/10.1007/s00521-023-08216-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08216-6

Keywords

Navigation