A detector for page-level handwritten music object recognition based on deep learning

Zhang, Yusen; Huang, Zhiqing; Zhang, Yanxin; Ren, Keyan

doi:10.1007/s00521-023-08216-6

A detector for page-level handwritten music object recognition based on deep learning

Original Article
Published: 20 January 2023

Volume 35, pages 9773–9787, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yusen Zhang¹,
Zhiqing Huang ORCID: orcid.org/0000-0002-8921-8238¹,
Yanxin Zhang² &
…
Keyan Ren¹

502 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Handwritten music recognition (HMR) is the technology of transcribing the content of images of music scores. The accurate detection of music objects at the page level is one of the main challenges of HMR. Thus far, the existing methods suffer from the tiny and dense nature of handwritten music notations and realize positive detection accuracy only on snippets. In this paper, we propose a detector that consists of a staff line removal model and a handwritten music object detection model for page-level handwritten music object recognition. First, an end-to-end staff line removal model R_Staff_Net based on residual learning reduces the complexity of page-level detection. Second, we developed an improved YOLO-V4 model for handwritten music object detection. The improvements mainly concern the adoption of a de-coupled detection head and visual attention module in the YOLO-V4, and an adaptive multi-scale feature fusion module AMFFM is used to enhance the textures and features of tiny music symbols in the deep convolution layers, and the gradient harmonized mechanism is utilized to address the inherent imbalance between music objects. We verified the R_Staff_Net and the improved YOLO-V4 model on the ICDAR/GREC staff line removal dataset and the MUSCIMA++ dataset, respectively. The experiments highlight that R_Staff_Net presents outstanding performance with an F-M score of 98.64%, and our improved YOLO-V4 model is superior to other handwritten music symbol detection methods with a mean average precision (mAP) of 91.8% when addressing page-level input. Although the experimental results of the detector show that the R_Staff_Net helps little to the overall mAP, the network is beneficial for symbols that are similar to staff lines or heavily overlap staff lines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Staff-Line Detection on Grayscale Images with Pixel Classification

Staff-line detection and removal using a convolutional neural network

Article 12 May 2017

WriterINet: a multi-path deep CNN for offline text-independent writer identification

Article 14 October 2022

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Baro A, Riba P, Fornés A (2016) Towards the recognition of compound music notes in handwritten music scores. Paper presented at the15th international conference on frontiers in handwriting recognition (ICFHR), IEEE, Shenzhen, 23–26 Oct 2016
Baró A, Riba P, Calvo-Zaragoza J et al (2019) From optical music recognition to handwritten music recognition: a baseline. Pattern Recognit Lett 123:1–8. https://doi.org/10.1016/j.patrec.2019.02.029
Article Google Scholar
Bochkovskiy A, Wang C, Liao H (2020) Yolov4: optimal speed and accuracy of object detection. Preprint at arXiv:2004.10934
Calvo-Zaragoza J, Rizo D (2018) End-to-end neural optical music recognition of monophonic scores. Appl Sci 8:606. https://doi.org/10.3390/app8040606
Article Google Scholar
Calvo-Zaragoza J, Micó L, Oncina J (2016) Music staff removal with supervised pixel classification. Int J Doc Anal Recognit 19:211–219. https://doi.org/10.1007/s10032-016-0266-2
Article Google Scholar
Calvo-Zaragoza J, Pertusa A, Oncina J (2017) Staff-line detection and removal using a convolutional neural network. Mach Vis Appl 28:665–674. https://doi.org/10.1007/s00138-017-0844-4
Article Google Scholar
Calvo-Zaragoza J, Valero-Mas J, Pertusa A (2017b) End-to-end optical music recognition using neural networks. Paper presented at the 18th international society for music information retrieval conference, ISMIR, Suzhou, 23–27 Oct 2017
Cao J, Li Y, Sun M, et al (2020) Do-conv: depthwise over-parameterized convolutional layer. Preprint at arXiv:2006.12030
Dai J, Li Y, He K, et al (2016) R-fcn: object detection via region-based fully convolutional networks. Paper presented at the 30th international conference on neural information processing systems, MIT Press , Barcelona, 5–10 Dec 2016
Escalera S, Fornés A, Pujol O et al (2009) Blurred shape model for binary and grey-level symbol recognition. Pattern Recognit Lett 30:1424–1433. https://doi.org/10.1016/j.patrec.2009.08.001
Article Google Scholar
Fornés A, Lladós J, Sánchez G et al (2010) Rotation invariant hand-drawn symbol recognition based on a dynamic time warping model. Int J Doc Anal Recognit (IJDAR) 13:229–241. https://doi.org/10.1007/s10032-010-0114-8
Article Google Scholar
Fornés A, Dutta A, Gordo A, et al (2011) The icdar 2011 music scores competition: staff removal and writer identification. Paper presented at the 2011 international conference on document analysis and recognition, IEEE, Beijing, 18–21 Sep 2011
Fornés A, Dutta A, Gordo A et al (2012) Cvc-muscima: a ground truth of handwritten music score images for writer identification and staff removal. Int J Doc Anal Recognit 15:243–251. https://doi.org/10.1007/s10032-011-0168-2
Article Google Scholar
Gai R, Chen N, Yuan H (2021) A detection algorithm for cherry fruits based on the improved yolo-v4 model. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06029-z
Article Google Scholar
Gallego A, Calvo-Zaragoza J (2017) Staff-line removal with selectional auto-encoders. Expert Syst Appl 89:138–148. https://doi.org/10.1016/j.eswa.2017.07.002
Article Google Scholar
Géraud T (2014) A morphological method for music score staff removal. Paper presented at the 2014 IEEE international conference on image processing, IEEE, Paris, 27–30 Oct 2014
Hajič J, Pavel P (2017) Detecting noteheads in handwritten scores with convnets and bounding box regression. Preprint at arXiv:1708.01806
Hajič J, Pecina P (2017) The muscima++ dataset for handwritten optical music recognition. Paper presented at the 14th IAPR international conference on document analysis and recognition, IEEE, kyoto, 9–12 Nov 2017
Hajič J, Dorfer M, Widmer G, et al (2018) Towards full-pipeline handwritten omr with musical symbol detection by u-nets. Paper presented at the 19th international society for music information retrieval conference, ISMIR, Paris, 23–27 Sep 2018
Hu J, Shen L, Sun G, (2018) Squeeze-and-excitation networks. Paper presented at the, (2017) IEEE conference on computer vision and pattern recognition. IEEE, Salt Lake City, 18–22 Jun 2018
K. A, K. A, B. A, et al (2018) Staff line removal using generative adversarial networks. Paper presented at the 24th international conference on pattern recognition, IEEE, Beijing, 20–24 Aug 2018
Lai W, Huang J, Ahuja N, et al (2017) Deep laplacian pyramid networks for fast and accurate super-resolution. Paper presented at the 2017 IEEE conference on computer vision and pattern recognition, IEEE, Honolulu, 21–26 Jul 2017
Li B, Liu Y, Wang X (2019) Gradient harmonized single-stage detector. Paper presented at the 33rd AAAI conference on artificial intelligence, AAAI, Hawaii, 27 Jan - 1 Feb 2019
Lin T, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. Paper presented at the 2017 IEEE international conference on computer vision, IEEE, Venice, 22–29 Oct 2017
Liu W, Anguelov D, Erhan D, et al (2016) Ssd: single shot multibox detector. Paper presented at the 14th European conference on computer vision, Spriner, Amsterdam, 8–16 Oct 2016
M. D, Hajič J, Widmer G (2017) On the potential of fully convolutional neural networks for musical symbol detection. Paper presented at the 14th international conference on document analysis and recognition, IEEE, Kyoto, 9–15 Nov 2017
Montagner I, Hirata N, J H (2017) Staff removal using image operator learning. Pattern Recognit 63:310–320. https://doi.org/10.1016/j.patcog.2016.10.002
Article Google Scholar
Pacha A, Eidenberger H (2017) Towards self-learning optical music recognition. Paper presented at the 16th international conference on machine learning and applications (ICMLA), IEEE, Cancun,18–21 Dec 2017
Pacha A, Choi K, Coüasnon B, et al (2018) Handwritten music object detection: open issues and baseline results. Paper presented at the 13th IAPR international workshop on document analysis systems, IEEE, Vienna, 24–27 Apr 2018
Rebelo A, Fujinaga I, Paszkiewicz F et al (2012) Optical music recognition: state-of-the-art and open issues. Int J Multimed Inf Retr 1:173–190. https://doi.org/10.1007/s13735-012-0004-6
Article Google Scholar
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. Preprint at arXiv:1804.02767
Ren S, He K, Girshick R, et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems 28:91–99. Paper presented at the Advances in neural information processing systems, MIT Press, Montreal, 7–12 Dec 2015
Rossant F, Bloch I (2007) Robust and adaptive omr system including fuzzy modeling, fusion of musical rules, and possible error detection. EURASIP J Adv Signal Process 2007:1–25. https://doi.org/10.1155/2007/81541
Article MATH Google Scholar
Santos CD, Capela A, Rebelo A et al (2009) Staff detection with stable paths. IEEE Trans Pattern Anal Mach Intell 31:1134–1139. https://doi.org/10.1109/TPAMI.2009.34
Article Google Scholar
Song G, Liu Y, Wang X (2020) Revisiting the sibling head in object detector. Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, 16–18 Jun 2020
Su B, Lu S, Pal U, et al (2012) An effective staff detection and removal technique for musical documents. Paper presented at the 10th IAPR international workshop on document analysis systems, IEEE, Queensland, 27–29 Mar 2012
Tuggener L, Elezi I, Schmidhuber J, et al (2018) Deep watershed detector for music object recognition. Paper presented at the 19th international society for music information retrieval conference, ISMIR, Paris, 23–27 Sep 2018
Visaniy M, Kieu AV, Fornés, Journet N (2013) Icdar 2013 music scores competition: staff removal. Paper presented at the 12th international conference on document analysis and recognition, IEEE, Washington DC, 25–28 Aug 2013
Wang J, Wang N, Li L (2020) Real-time behavior detection and judgment of egg breeders based on yolo v3. Neural Comput Appl 32:5471–5481. https://doi.org/10.1007/s00521-019-04645-4
Article Google Scholar
Woo S, Park J, Lee JY, et al (2018) Cbam: convolutional block attention module. Paper presented at the 15th European conference on computer vision, Springer, Munich, 8–14 Sep 2018
Wu Y, Chen Y, Yuan L, et al (2020) Rethinking classification and localization for object detection. Paper presented at the 2020 IEEE/CVF conference on computer vision and pattern recognition, IEEE, Seattle, 16–18 Jun 2020
Zhang H, Cisse M, Dauphin YN, et al (2017) mixup: beyond empirical risk minimization. Preprint at arXiv:1710.09412
Zheng Z, Wang P, Liu W, et al (2020) Distance-iou loss: faster and better learning for bounding box regression. Paper presented at the 34rd AAAI conference on artificial intelligence, AAAI, New York, 7–12 Feb 2020

Download references

Funding

This study was funded by National Natural Science Foundation of China (Young Scientists Fund) 618002044.

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
Yusen Zhang, Zhiqing Huang & Keyan Ren
School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, 100044, China
Yanxin Zhang

Authors

Yusen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yanxin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Keyan Ren
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YZ and ZH conceived of and designed the experiments. YZ performed the experiments and analyzed the data. ZH, YZ, YZ, and KR wrote this paper together.

Corresponding author

Correspondence to Zhiqing Huang.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Huang, Z., Zhang, Y. et al. A detector for page-level handwritten music object recognition based on deep learning. Neural Comput & Applic 35, 9773–9787 (2023). https://doi.org/10.1007/s00521-023-08216-6

Download citation

Received: 14 December 2021
Accepted: 06 January 2023
Published: 20 January 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00521-023-08216-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A detector for page-level handwritten music object recognition based on deep learning

Abstract

Access this article

Similar content being viewed by others

Staff-Line Detection on Grayscale Images with Pixel Classification

Staff-line detection and removal using a convolutional neural network

WriterINet: a multi-path deep CNN for offline text-independent writer identification

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A detector for page-level handwritten music object recognition based on deep learning

Abstract

Access this article

Similar content being viewed by others

Staff-Line Detection on Grayscale Images with Pixel Classification

Staff-line detection and removal using a convolutional neural network

WriterINet: a multi-path deep CNN for offline text-independent writer identification

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation